RAGHU INSTITUTE OF TECHNOLOGY
(Autonomous)
Affiliated to JNTU GURAJADA, VIZIANAGARAM
Approved by AICTE, Accredited by NBA, Accredited by NAAC with A+
Grade
Comparative evaluation of traditional machine learning
and deep learning classification techniques for sentiment
analysis
A Project Report By
Batch No : 8 213J1A0509 : Baggu. Ramesh
Project Guide 213J1A0517 : Bevara. Vandhana
M. Krishna Kishore 213J1A0531 : Chintala. Dinakar
Department Of CSE 223J5A0504 : Chokkokula.
Nookaratna
Abstract
• Objective and Scope: The project aims to analyze the sentiment of Twitter data, classifying
tweets into categories such as positive, negative, and neutral to gauge public opinion on
various topics, events, or products.
• Methodology: Utilizing natural language processing (NLP) techniques such as tokenization,
sentiment lexicons, and machine learning models (e.g., Naive Bayes, Support Vector
Machines), the model processes and categorizes tweets for sentiment analysis.
• Applications and Insights: The analysis provides valuable insights into consumer behavior,
political sentiment, or public perception, which can benefit businesses, brands, or
policymakers in decision-making and trend forecasting.
Twitter
• Twitter is a social media platform that enables users to post
and interact with messages known as "tweets,“.
• Twitter serves as a real-time information network, providing
users with instant updates on news, events, trends, and
discussions from around the world.
• Twitter serve as vast repositories of public opinion and
sentiment.
Literature Surveys :
Deep Learning Techniques in Sentiment Analysis
• Survey: "A Survey on Sentiment Analysis in Social Media: Techniques, Applications, and
Challenges" (2022)
• Key Points:
• The paper explores the integration of deep learning techniques (such as Convolutional
Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)) in sentiment analysis,
especially for social media platforms like Twitter.
• Emphasizes the importance of pre-trained models like BERT (Bidirectional Encoder
Representations from Transformers) and its variants (e.g., RoBERTa, DistilBERT) for more
accurate sentiment prediction.
• Discusses challenges in handling noisy and informal language often seen in tweets, like
abbreviations, hashtags, and emojis.
• Conclusion: Deep learning models, while computationally expensive, provide significant
improvements over traditional machine learning models in sentiment analysis on social media.
Literature Surveys :
Deep Learning Techniques in Sentiment Analysis
• Survey: "A Survey on Sentiment Analysis in Social Media: Techniques, Applications, and Challenges" (2022)
Advantages:
1.Comprehensive Overview: The survey provides a broad understanding of sentiment analysis
techniques applied to social media data, offering an in-depth look at various approaches, including
traditional methods (like machine learning) and more modern ones (like deep learning).
2.Detailed Categorization of Techniques: The paper categorizes and explains various sentiment
analysis techniques, including rule-based methods, machine learning-based methods, and deep learning
methods. This provides readers with clarity on how different methods work and their relevance to social
media data.
3.Applications in Diverse Fields: The survey discusses various real-world applications of sentiment
analysis, such as in marketing, political analysis, and customer feedback, making it valuable for both
academia and industry. This helps demonstrate the versatility and impact of sentiment analysis across
different domains.
4.Addressing Challenges: The paper highlights key challenges in the field, including issues related to
noisy data, the diversity of languages, sarcasm detection, and handling large-scale data, which provides
a clear understanding of the limitations and obstacles researchers need to overcome.
5.Recent Trends and Advancements: By providing an up-to-date analysis of techniques, applications,
and challenges in 2022, the paper keeps the reader informed of the latest advancements and emerging
trends in sentiment analysis, which is essential in a fast-moving field like social media analytics.
6.Guidance for Future Research: The paper identifies open problems and suggests directions for
future research, which can be helpful for researchers looking to explore novel areas of sentiment
Literature Surveys :
Deep Learning Techniques in Sentiment Analysis
• Survey: "A Survey on Sentiment Analysis in Social Media: Techniques, Applications, and Challenges" (2022)
Disadvantages:
1.Limited Scope of Data Sources: While the paper focuses on social media, it may not cover
sentiment analysis in other types of unstructured text data (e.g., reviews, forums). Expanding to other
sources would provide a more holistic view of the field.
2.Lack of Quantitative Results: Many surveys focus on qualitative discussions without offering
specific quantitative results (e.g., accuracy rates, performance comparisons). The lack of empirical data
or benchmarks may limit readers' ability to assess the effectiveness of different techniques.
3.Overemphasis on Specific Techniques: The paper may focus heavily on certain popular techniques
(such as deep learning or neural networks) and may not delve as much into more traditional or hybrid
techniques, which might still be valuable in certain applications, especially for those with resource
constraints.
4.Insufficient Discussion of Ethical Concerns: The paper might underemphasize important ethical
issues related to sentiment analysis, such as privacy concerns, biases in data, or the potential for misuse
in manipulating public opinion.
5.Generalization of Challenges: Some of the challenges discussed may be too general or not
sufficiently detailed. For instance, while challenges like noisy data and sarcasm detection are mentioned,
they may not be explored in enough depth to provide actionable insights for researchers.
6.Lack of Case Studies: The paper may lack detailed case studies or real-world examples that
demonstrate how sentiment analysis has been applied successfully (or failed) in practice. This would
have provided more practical insights for those interested in implementing these techniques.
Sentiment Analysis on Social Media Data (Twitter-Focused)
• Survey: "Social Media Sentiment Analysis: A Survey of Techniques and Applications" (2022)
• Key Points:
• Focuses specifically on Twitter data, including challenges such as the informal language, use of
slang, hashtags, emojis, and shortened words.
• Explores hybrid approaches that combine traditional machine learning methods (e.g., Naive
Bayes, Support Vector Machines) with modern deep learning methods (e.g., Long Short-Term
Memory Networks (LSTMs), Bidirectional LSTMs).
• Investigates the effectiveness of sentiment analysis models for real-time event monitoring (e.g.,
stock market predictions, political events, or crisis management).
• Conclusion: While traditional models like Naive Bayes still perform well on simpler datasets, deep
learning models and transfer learning techniques (using pre-trained embeddings like Word2Vec,
GloVe, and BERT) are proving to be more accurate for real-time social media analysis.
Sentiment Analysis on Social Media Data (Twitter-Focused)
• Survey: "Social Media Sentiment Analysis: A Survey of Techniques and Applications" (2022)
Advantages:
1.Thorough Coverage of Techniques: The survey provides a detailed exploration of various techniques
for sentiment analysis, including traditional methods like Naive Bayes and Support Vector Machines (SVM),
as well as more recent approaches like deep learning (e.g., LSTMs, CNNs, transformers). This variety gives
readers an understanding of how sentiment analysis has evolved and the strengths of each approach in
different contexts.
2.Real-World Applications: The paper goes beyond just discussing the techniques and delves into their
applications in real-world scenarios, such as brand monitoring, political sentiment analysis, product
reviews, and public opinion tracking. This makes the survey particularly useful for both academic
researchers and industry practitioners interested in applying these methods.
3.Up-to-Date Information: The paper provides a 2022 perspective, which means it includes recent
advancements in the field, such as the use of transformers like BERT and GPT for sentiment analysis, which
are crucial in handling more complex and nuanced data found on social media platforms.
4.Challenges and Limitations: A significant strength of the paper is its discussion of the challenges in
social media sentiment analysis, such as dealing with noisy and unstructured data, understanding sarcasm,
detecting multi-lingual sentiments, and managing massive amounts of data. These challenges are crucial
for researchers to consider when designing or refining sentiment analysis models.
5.Comprehensive Literature Review: The survey provides an extensive review of the literature,
summarizing key studies, methodologies, and findings from previous research. This allows readers to
quickly access the most relevant studies and learn from the existing body of work.
6.Clear Categorization of Approaches: The paper organizes the techniques into clear categories, such
as lexicon-based, machine learning-based, and deep learning-based methods. This clear structure makes it
easy for readers to understand the various methodologies and choose one that fits their needs.
7.Future Directions: By outlining future research directions, such as improving accuracy in handling
ambiguous language or improving cross-domain sentiment analysis, the paper provides valuable guidance
Sentiment Analysis on Social Media Data (Twitter-Focused)
• Survey: "Social Media Sentiment Analysis: A Survey of Techniques and Applications" (2022)
Disadvantages:
1.Limited Empirical Results: While the paper provides a comprehensive theoretical analysis, it may lack
specific empirical comparisons of the techniques mentioned, such as performance metrics (e.g., accuracy,
F1 score) on benchmark datasets. Including such information would allow for a more practical evaluation of
the methods.
2.Focus on Certain Platforms: The survey may focus heavily on the sentiment analysis of popular social
media platforms like Twitter, Facebook, and Instagram. However, there may be limited exploration of niche
platforms or emerging social media networks that might have different challenges or require unique
approaches.
3.Overemphasis on Textual Data: The paper primarily focuses on textual data for sentiment analysis,
but social media sentiment analysis can also involve multimedia content such as images, videos, or audio.
The limited attention to these forms of data might be a disadvantage, as multi-modal sentiment analysis is
becoming more important.
4.Superficial Coverage of Ethical Issues: Ethical considerations, such as privacy concerns, bias in
sentiment analysis models, and the potential misuse of sentiment data for manipulation or surveillance,
may not be addressed in sufficient depth. Given the significant ethical implications of social media data
analysis, a deeper discussion of these issues is needed.
5.Potential Bias in Datasets: Many of the studies referenced in the paper might use popular datasets
like Twitter or Reddit, which could lead to a bias toward certain linguistic styles or demographics. A
discussion on how dataset biases affect model performance and generalizability would have been valuable.
6.Lack of Case Studies: While the paper touches on applications, it may not provide enough real-world
case studies that demonstrate the challenges and successes of sentiment analysis in action. Case studies
would help readers understand how the techniques are applied in practice and how they address the issues
discussed.
7.Not Enough Focus on Hybrid Approaches: Hybrid approaches that combine multiple sentiment
Transfer Learning for Social Media Sentiment
• Survey: "Transfer Learning for Sentiment Analysis: A Survey" (2021)
• Key Points:
• Focuses on the application of transfer learning, where models pre-trained on large corpora
(e.g., BERT, GPT, T5) are fine-tuned on specific sentiment analysis tasks like Twitter
sentiment analysis.
• Transfer learning enables better generalization in handling the idiosyncrasies of social media
language, as models learn from large datasets and can be adapted for small, domain-specific
corpora.
• Also discusses cross-domain and cross-language sentiment transfer, a critical feature when
analyzing global Twitter datasets.
• Conclusion: Transfer learning has revolutionized sentiment analysis, especially in handling diverse
linguistic challenges presented by Twitter data.
Transfer Learning for Social Media Sentiment
• Survey: "Transfer Learning for Sentiment Analysis: A Survey" (2021)
Advantages:
1.Focused on Transfer Learning: One of the key strengths of the survey is its deep focus on transfer
learning, which has become an increasingly popular technique in natural language processing (NLP),
especially for sentiment analysis. By concentrating on this specific approach, the paper provides an in-
depth understanding of how TL has evolved and its applications to sentiment classification tasks.
2.Comprehensive Overview of TL Techniques: The paper offers a thorough review of different transfer
learning methodologies used in sentiment analysis, including fine-tuning pre-trained models like BERT,
GPT, and RoBERTa. This allows readers to see the various ways TL has been employed to enhance
sentiment analysis performance, especially with regard to limited labeled data.
3.Practical Applications: The paper addresses a wide range of applications of sentiment analysis using
transfer learning, such as customer feedback analysis, product review analysis, and social media
sentiment monitoring. This makes the survey useful for both researchers and industry practitioners who
may be looking for specific applications of TL in sentiment analysis.
4.Recent Developments: As the paper was published in 2021, it provides up-to-date insights on the
latest trends in transfer learning techniques, including the rise of transformer-based models and their
superior performance compared to traditional methods. This is crucial in a fast-developing field like NLP.
5.Challenges and Limitations of TL in Sentiment Analysis: The survey effectively highlights the
challenges of applying transfer learning to sentiment analysis, such as domain adaptation, overfitting, and
handling imbalanced datasets. These insights are valuable for researchers looking to overcome these
challenges or refine their models.
6.Comparison with Other Approaches: The paper compares transfer learning-based approaches with
traditional machine learning and deep learning techniques. This provides context for why TL has become a
popular choice, illustrating its advantages in terms of performance, especially in low-resource scenarios.
7.Clear Structure and Organization: The paper is well-organized, with sections clearly dedicated to
discussing the basics of transfer learning, its applications in sentiment analysis, the challenges, and
Transfer Learning for Social Media Sentiment
• Survey: "Transfer Learning for Sentiment Analysis: A Survey" (2021)
Disadvantages:
1.Lack of Quantitative Comparisons: While the paper discusses various transfer learning techniques, it
may lack direct quantitative comparisons (e.g., performance metrics such as accuracy, F1 score, or
precision/recall) across different models and datasets. Without these metrics, it is harder for readers to
assess the practical effectiveness of these models.
2.Limited Coverage of Non-Textual Data: The paper focuses primarily on textual sentiment analysis
and does not delve into the growing field of multi-modal sentiment analysis, which incorporates other
types of data, such as images, videos, and audio. A broader exploration of how TL can be applied to these
types of data would have made the survey more comprehensive.
3.Overemphasis on Certain Models: The paper may place too much emphasis on popular transformer-
based models like BERT, which, while highly effective, may not always be the best choice for all sentiment
analysis tasks. There could be more discussion on hybrid models, which combine multiple approaches or
transfer learning with other techniques to address specific challenges.
4.Insufficient Exploration of Ethical Issues: The paper does not address the ethical considerations
surrounding the use of transfer learning models in sentiment analysis, such as biases in pre-trained
models, privacy concerns, and the potential misuse of sentiment data. Ethical implications are important in
applied research, and their omission might be seen as a significant gap.
5.Assumes Background Knowledge: The survey assumes that readers are familiar with basic concepts
in machine learning and natural language processing. While this is typical for research surveys, it could
limit the accessibility of the paper to readers without a solid technical background.
6.Challenges May Be Over-Simplified: Although the paper identifies key challenges in applying transfer
learning to sentiment analysis, such as domain adaptation and data imbalance, the discussion might be
somewhat superficial. In-depth solutions to these challenges or more detailed examples could enhance the
paper's value.
7.Potential Bias Toward Popular Datasets: The paper may rely heavily on popular benchmark
Sentiment Analysis Using Multimodal Data (Text + Visual Data)
• Survey: "Multimodal Sentiment Analysis: A Survey" (2023)
• Key Points:
• New approaches are integrating both textual and visual data (such as images and GIFs) in
sentiment analysis, particularly for social media platforms where posts often contain both text
and visual elements.
• Discusses the role of multimodal models (e.g., CLIP, Vision Transformers) that fuse visual and
textual cues to improve sentiment classification accuracy.
• Twitter datasets that include images (e.g., memes, infographics) offer additional layers of
complexity but also provide richer sentiment insights.
• Conclusion: Multimodal sentiment analysis is emerging as an important direction for analyzing
tweets that combine text with visual content.
Sentiment Analysis Using Multimodal Data (Text + Visual Data)
• Survey: "Multimodal Sentiment Analysis: A Survey" (2023)
Advantages:
1.Comprehensive Overview of Multimodal Approaches: The survey offers a thorough exploration of
how sentiment analysis can benefit from combining multiple modalities, such as text, audio, and visual
data. This is particularly useful in understanding how sentiment analysis can be enhanced beyond text-
based approaches, making it a comprehensive resource for multimodal research.
2.Cutting-Edge Techniques: The paper reviews the latest multimodal sentiment analysis methods,
including deep learning models that fuse data from different sources (e.g., transformers, CNNs, LSTMs).
This is important for understanding how new techniques have evolved and their potential for improved
performance over traditional, unimodal approaches.
3.Wide Range of Applications: The survey discusses various real-world applications of multimodal
sentiment analysis, such as social media monitoring, customer feedback analysis, and emotion detection in
videos. This shows the practical impact and versatility of multimodal approaches in different domains,
which adds significant value for industry professionals and researchers.
4.Addressing the Challenges in Multimodal Sentiment Analysis: The paper highlights key
challenges, such as data alignment, modality fusion, and dealing with noisy or incomplete multimodal data.
Identifying these challenges helps researchers understand the complexities involved in working with
multimodal datasets and provides insights into where future improvements are needed.
5.Recent Advancements and Trends: Since the survey was published in 2023, it covers the latest
advancements, including the use of large pre-trained models (e.g., CLIP for vision and language) and
advancements in cross-modal learning. This ensures that readers are up-to-date on the most current
methods and technologies in the field.
6.Future Research Directions: The paper provides valuable insights into areas where future research is
needed, such as enhancing cross-modal fusion techniques, improving multimodal data annotation, and
addressing the ethical challenges in the use of multimodal data. This can guide new research initiatives and
help researchers identify unexplored areas of the field.
7.Clear Categorization and Structure: The paper is well-organized, with sections dedicated to different
Disadvantages:
1.Lack of Detailed Empirical Comparisons: While the paper discusses various multimodal methods, it
may not provide detailed, quantitative comparisons (e.g., performance metrics like accuracy, F1 score,
etc.) of different multimodal models. This is crucial for readers to assess the practical effectiveness of the
different techniques described in the survey.
2.Complexity in Model Implementation: The integration of multiple modalities can result in more
complex models that require large computational resources and extensive fine-tuning. The paper may not
sufficiently address the practical difficulties in implementing and deploying multimodal sentiment analysis
systems, especially for small-scale operations.
3.Overemphasis on Deep Learning Models: The survey may focus predominantly on deep learning-
based approaches, such as CNNs and transformers. While these models are state-of-the-art, they may not
always be the best option for all use cases. A deeper discussion of hybrid or simpler approaches could have
been beneficial, especially for situations with limited data or computational resources.
4.Limited Discussion of Ethical Issues: Ethical concerns, such as privacy issues related to multimodal
data (e.g., voice or image data), algorithmic biases, and the potential misuse of multimodal sentiment
analysis, are only briefly touched upon. Given the sensitivity of multimodal data, a more detailed
exploration of these ethical challenges is needed.
5.Scarcity of Real-World Case Studies: Although the paper covers a wide range of applications, it might
not include enough real-world case studies or examples of how multimodal sentiment analysis is
successfully applied in practice. Including such case studies would provide more concrete examples and
insights into the challenges and benefits of deploying multimodal systems.
6.Data Alignment and Fusion Issues: While the survey identifies challenges in multimodal sentiment
analysis, it may not go into sufficient detail about how to effectively address issues like modality alignment
(i.e., how to synchronize different data types like audio, text, and image) or the optimal ways to combine
the different modalities. A deeper dive into these challenges would have been useful for researchers
looking to develop more effective models.
7.Limited Focus on Low-Resource Scenarios: The paper may place less emphasis on how multimodal
sentiment analysis can be adapted to low-resource environments, where labeled multimodal datasets
might be scarce or computational resources are limited. More discussion on how to tackle these challenges
Real-Time Twitter Sentiment Analysis for Crisis Management
• Survey: "Real-Time Sentiment Analysis of Twitter for Crisis Management" (2023)
• Key Points:
• Focuses on using sentiment analysis to monitor Twitter data during real-time events like
natural disasters, political crises, or public health emergencies.
• Discusses the need for sentiment classifiers that can quickly and accurately classify the mood
or opinion of the public in such situations, helping authorities respond more effectively.
• Introduces novel approaches combining sentiment analysis with event detection algorithms to
identify emerging topics of interest in real-time.
• Conclusion: Real-time sentiment analysis on Twitter can significantly enhance crisis response
strategies by providing a live pulse of public opinion and emotional reactions.
Real-Time Twitter Sentiment Analysis for Crisis Management
• Survey: "Real-Time Sentiment Analysis of Twitter for Crisis Management" (2023)
Advantages:
1.Timely and Relevant Application: The paper focuses on the real-time application of sentiment
analysis, which is highly relevant in the context of managing crises. Social media, particularly Twitter, is a
key platform for public discourse during crises, and the paper emphasizes the importance of analyzing
sentiments in real-time for effective crisis management and response.
2.Real-World Impact: By focusing on the application of sentiment analysis for crisis management, the
paper addresses a critical, real-world issue. The ability to assess public sentiment quickly and accurately
during crises such as natural disasters, political upheaval, or public health emergencies can significantly
improve decision-making processes for organizations and governments.
3.Detailed Explanation of Methodologies: The paper provides a thorough overview of the techniques
and models used in real-time sentiment analysis, including machine learning methods and deep learning
models, as well as data pre-processing and sentiment classification algorithms. This detailed explanation
is valuable for researchers and practitioners interested in implementing these techniques.
4.Emphasis on Real-Time Analysis: The focus on real-time sentiment analysis is a key strength. Many
sentiment analysis studies focus on batch processing, but real-time systems can be more valuable in crisis
situations where immediate action is necessary. The paper’s emphasis on streaming data processing, fast
analysis, and quick insights is crucial for understanding the time-sensitive nature of crisis management.
5.Application to Crisis Scenarios: The paper highlights how sentiment analysis can be applied to
specific crisis scenarios, such as disaster response or political crisis monitoring. It connects theory with
practice by showing how real-time analysis can help in decision-making and help organizations understand
public sentiment, predict future trends, and shape their responses effectively.
6.Challenges and Limitations: The paper addresses several challenges specific to real-time sentiment
analysis, such as handling noisy data, managing large volumes of tweets, dealing with the nuances of
language (e.g., sarcasm, irony, or informal language), and ensuring that sentiment models can operate
with high accuracy in real-time settings. These insights are important for improving the reliability and
Real-Time Twitter Sentiment Analysis for Crisis Management
• Survey: "Real-Time Sentiment Analysis of Twitter for Crisis Management" (2023)
Disadvantages:
1.Lack of Empirical Results: While the paper discusses the methods and techniques for real-time
sentiment analysis, it may lack detailed empirical comparisons of these methods. For example, it may not
present specific performance metrics (e.g., accuracy, F1 score, processing speed) for the various models
used in real-time scenarios. Quantitative comparisons are crucial for evaluating which methods work best
under different crisis conditions.
2.Overemphasis on Twitter: The paper focuses primarily on Twitter as the data source for sentiment
analysis. While Twitter is a major platform, other social media platforms (e.g., Facebook, Instagram,
Reddit) can also provide valuable sentiment data during crises. A broader exploration of how sentiment
analysis can be applied across various platforms would make the survey more comprehensive.
3.Limited Discussion on Ethical Issues: The paper may not delve deeply into the ethical implications
of real-time sentiment analysis on social media data. Concerns such as privacy, data security, and the
potential misuse of sentiment data (e.g., for surveillance or manipulation) are important aspects of the
field, especially in the context of crisis management.
4.Challenges in Handling Noisy Data: The paper discusses the challenges posed by noisy data (e.g.,
irrelevant or off-topic tweets) but may not provide detailed strategies for effectively cleaning and filtering
this data. Noisy data is particularly problematic in real-time settings, and a more in-depth exploration of
filtering techniques and model robustness would have been helpful.
5.Limited Coverage of Multimodal Sentiment Analysis: While the paper focuses on text-based
sentiment analysis, multimodal sentiment analysis (which incorporates images, videos, and audio) can
also be a valuable tool in crisis management. A discussion of how multimodal data could complement
text-based sentiment analysis in crisis situations would have added another layer of insight to the paper.
6.Potential Bias in Data: Social media data, including Twitter, can be biased based on various factors
such as demographic differences, platform usage, and the nature of posts. The paper may not sufficiently
explore how such biases in Twitter data could impact sentiment analysis results or how to mitigate them.
UML DIAGRAMS :
Sentiment Analysis
• Sentiment analysis is a natural language
processing (NLP) technique.
• Helps determine the emotional tone or sentiment
expressed in a piece of text.
• Involves analyzing text data to classify it as
positive, negative, or neutral based.
Data Description
1. It contains 1,600,000 tweets extracted using the twitter api .
2. It contains 6 variables as follows:
sentiment flag
ids user
date text
Data Visualization Neutral
Positive
Negative
As we can see from the two plots, although there were 3 different categories of sentiment for the data, on 2 were
actually present in the data, with no neutral sentiment available.
Data Visualization
Data Visualization
Descriptive Statistics
Information Statistics
# Colum Non-Null Dty # Sentim ids
n Count pe ent
0 Sentim 1600000 int6 Cou 1.60000 1.600000e+06
Here we can see the various nt e+06
ent 4 information regarding the
Mea 2.00000 1.99881e+09
1 ids 1600000 int6 dataset we used to perform
4 the analysis n e+00
Std 2.00001 1.93576e+08
2 date 1600000 obje
ct e+00
3 flag 1600000 obje Min 0.00000
1.46781e+09
ct e+00
25 0.00000 1.95691e+09
4 user 1600000 obje Shape
ct % e+00
5 text 1600000 obje Columns Rows 50 2.00000
2.02102e+09
ct % e+00
1600000 6 2.17759e+09
75 4.00000
% e+00
Max 4.00000
Unique values
Descriptive Statistics Column No of Unique
Values
Sentiment 2
ids 1598315
Categorical stats Date 774363
# Date Flag User Text Flag 1
Count 1600000 1600000 160000 1600000 User 659775
0 Text 1581466
Unique 774363 1 659775 1581466
Top Mon Jun NO_QUE Log_do isPlayer Has No missing values
15 RY g Died! sorry Column No of missing
Frequenc 20 1600000 549 210 values
y Sentiment 0
ids 0
Date 0
Flag 0
User 0
Text 0
Processing the Tweets
1 Data Cleaning
We cleaned the data by removing the unnecessary columns like ids, flag, date and
user.
2 Preprocessing
We cleaned the text column further by removing hyper links and special characters.
Then we added several other columns like word count, average word count,
number of characters, and so on.
3 Tokenization
The text from the tweets were then broken into words and the above functions were
performed .
Methods
• Variety of methods were used
• Since this was a categorical dataset we only had to use
classification models
• The classification models used were LSTM(Long Short
Term Memory ) , GRU ( Gated Recurrent Unit ) ,
Random Forest and SVM
These models will be trained on the preprocessed dataset
and their performance will be noted and compared. We will
also compare the performance between the two types of
learning mechanism.
METHODS USED
LSTM
GRU
RANDOM SVM
FOREST
K-NN
Challenges and Limitations
• It can be challenging to understand the platform's informal, colloquial language.
• Even sophisticated natural language processing algorithms can become confused by slang, irony,
and intricate contextual clues.
• Depending on one's viewpoint, the same text might express wildly disparate emotions.
• Annotator bias and interpretation can play a major role in determining the emotional tone of a
tweet, This may result in inconsistent labelling.
• It is unable to distinguish between everyday language and sarcasm. Additionally, its capacity to
manage word non-linearity is restricted.
Conclusion and Future
Directions
In this presentation, we have explored the powerful
capabilities of sentiment analysis on Twitter data.
• For future references I would like to see more powerful models to
be used to get better results.
• Natural Language Processing and Deep Learning models is the way
to go for the future.
• This would enable brands to properly analyze the user sentiment to
adjust their marketing strategies accordingly. It would also help with
properly analyzing the flow of support during voting season.