0% found this document useful (0 votes)
81 views26 pages

Sarcastic Tweet - MGR

This document discusses using deep learning techniques like CNN and LSTM for classifying sarcastic and non-sarcastic tweets. It outlines the objectives to increase accuracy and reduce time consumption for predicting hate speech. It surveys several papers on related topics, including predicting message popularity on big data and using BiLSTM for multi-label hate speech classification. The existing systems' disadvantages are high redundancy and difficulty extracting optimal opinions. The proposed system is a Tree Convolution Neural Network to address these issues.

Uploaded by

vikramgandhi89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views26 pages

Sarcastic Tweet - MGR

This document discusses using deep learning techniques like CNN and LSTM for classifying sarcastic and non-sarcastic tweets. It outlines the objectives to increase accuracy and reduce time consumption for predicting hate speech. It surveys several papers on related topics, including predicting message popularity on big data and using BiLSTM for multi-label hate speech classification. The existing systems' disadvantages are high redundancy and difficulty extracting optimal opinions. The proposed system is a Tree Convolution Neural Network to address these issues.

Uploaded by

vikramgandhi89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26



Sarcastic and Non Sarcastic


Tweet Classification Using
Deep Learning
Abstract
Nowadays, social media has become a popular channel for people to exchange
opinions through the user-generated text.
Exploring the mechanisms about how customers' opinions towards products are
inf lu enced by friends, and further predicting their future opinions have attracted
great attention from corporate administrators and researchers.
Various inf lu ence models have already been proposed for the opinion prediction
problem.
However, they largely formulate opinions as derived sentiment categories or values
but ignore the role of the content information. Besides, existing models only make
use of the most recently received information without taking into consideration the
long-term historical communication.
To keep track of user opinion behaviors and infer user opinion inf lu ence from the
historical exchanged textual information, we develop a content-based sequential
opinion influence framework.
Based on this framework, two opinion sentiment prediction models with alternative
prediction strategies are proposed. In the experiments conducted on three Twitter
datasets, the proposed models outperform other popular influence models.
An interesting f inding based on a further analysis of user characteristic is that an
individual's influence is correlated to her/his style of expressions.
In our project we will be using CNN+LSTM as existing and Tree Convolution Neural
Network (TCNN) as proposed system.
Introduction to Project
Classif ication of opinion is becoming very important as they play a
vital role in decision making.
Sentiments expressed via micro blogging sites such as Instagram,
twitter, Facebook, etc. helps in understanding the mindset of people.
Several models have been proposed to investigate and extract distinct
opinions from users where the accuracy is important.
Text mining and data mining helps to identify patterns and establish
relationships to solve problems in analyzing large datasets.
Tools that are used for data mining allow commercial enterprises to
draw conclusion for future trends.
We propose a content-based sequential opinion inf luence framework
to incorporate the content information with the historical information
for sentiment prediction in opinion dynamics.
Introduction to Domain
Deep Learning:
In the statistical context, Deep Learning is defined as an application of
artif icial intelligence where available information is used through
algorithms to process or assist the processing of statistical data.
While Deep Learning involves concepts of automation, it requires
human guidance.
Deep Learning involves a high level of generalization in order to get a
system that performs well on yet unseen data instances.
Deep learning is a relatively new discipline within Computer Science
that provides a collection of data analysis techniques.
Some of these techniques are based on well established statistical
methods (e.g. logistic regression and principal component analysis)
while many others are not.
Objective
Increase the ef ficiency in predicting the hatred
speech in good percentage.
Less time consumption.
User friendly and should be applicable to all
datasets.
More accuracy rate.
Literature Survey 1
Predicting the popularity of messages based on
big data
Publisher: IEEE 2021
Jun Zhou; Guiping Wu
In the work, we systematically and
comprehensively study three types of features:
user features, text features and time features.
Multiple comparison experiments are carried out
on big data platform.
Experimental results show that time features are
the most valuable features, almost close to the
ef fect of all the features, and the popularity of
mes s ages is predict ed wit h a s at is fact or y
accuracy.
Literature Survey 2
Twitter’s Hate Speech Multi-label Classif ic ation Using
Bidirectional Long Short-term Memory (BiLSTM) Method
Publisher: IEEE 2021
Refa Annisatul Ilma
This research was done as an attempt to take care of the
dangers that could be done by hate speech.
The at tempt we tried to do is using multi-label text
classif ication to predict hate speech with the Bidirectional
Long Short-term Memory (BiLSTM) method.
This multi-label text classif ication labelled every tweet in
the dataset crawled from Twitter with 12 labels about hate
speech.
Fro m t h i s e x p e ri me n t , we o b t ai n e d t h e b e st
hyperparameter value that could achieve great performance
with 82.31% accuracy, 83.41% precision, 87.28% recall, and
85.30% F1-score.
Literature Survey 3
A Machine Learning Pipeline to Examine Political Bias
with Congressional Speeches
Publisher: IEEE 2021
Prasad Hajare; Sadia Kamal;
We propose a method to exploit the features of entities
on transcripts collected from political speeches in US
congress to label political bias of social media posts
automatically without any human intervention.
With existing machine learning algorithms we achieve
the highest accuracy of 70.5% and 65.1% to predict
posts on Twitter and Gab data respectively.
We also present a machine learning approach that
combines features from cascades and text to forecast
cascade’s political bias with an accuracy of about 85%.
Literature Survey 4
Better Prevent than React: Deep Stratif ied Learning to
Predict Hate Intensity of Twitter Reply Chains
Publisher: IEEE 2021
Dhruv Sahnan; Snehil Dahiya;
We propose DRAGNET, a deep stratif ie d learning
framework which predicts the intensity of hatred that a
root tweet can fetch through its subsequent replies.
We extend the collection of social media discourse
from our earlier work comprising the entire reply chains
up to ∼5k root tweets catalogued into four controversial
topics Similar.
we notice a handful of cases where despite the root
tweets being non-hateful, the succeeding replies inject
an enormous amount of toxicity into the discussions.
Literature Survey 5
Analyze Hate Contents on Sinhala Tweets using
an Ensemble Method
Publisher: IEEE 2021
Madurangi Guruge;
T h i s w o r k i n v e s t i g a t e s a m e c h a n i s m fo r
d e t e c t i n g h a t e co n t e n t t yp e d i n S i n h a l a
language and posted on Twitter.
The proposed super vised mechanism is an
ensemble method that selects the most accurate
result from different models.
63% of accuracy, 58% of F1 Score, 61% of
Precision and 58% of Recall were achieved when
predicting hate content.
Existing system-CNN
Convolution Neural Network:
In deep learning, a convolutional neural network (CNN, or ConvNet) is
a class of deep neural networks, most commonly applied to analyzing
visual imagery.
They are also known as shift invariant or space invariant artif icial
neural networks (SIANN), based on their shared-weights architecture
and translation invariance characteristics.
CNNs are regularized versions of multilayer perceptrons.
Multilayer perceptrons usually mean fully connected networks, that is,
each neuron in one layer is connected to all neurons in the next layer.
The "fully-connectedness" of these networks makes them prone to
overf itting data. Typical ways of regularization include adding some
form of magnitude measurement of weights to the loss function.
CNNs take a dif fe rent approach towards regularization: they take
advantage of the hierarchical pattern in data and assemble more
complex patterns using smaller and simpler patterns.
Therefore, on the scale of connectedness and complexity, CNNs are
on the lower extreme.
Existing system-LSTM
Long Short Term Memory (LSTM):
Long short-term memory (LSTM) is an artif icial recurrent neural
network (RNN) architecture used in the field of deep learning.
Unlike standard feed forward neural networks, LSTM has feedback
connections. It can not only process single data points (such as images
), but also entire sequences of data (such as speech or video). For
example, LSTM is applicable to tasks such as unsegmented, connected
handwriting recognition, speech recognition and anomaly detection in
network traffic or IDSs (intrusion detection systems).
A common LSTM unit is composed of a cell, an input gate, an output
gate and a forget gate.
The cell remembers values over arbitrary time intervals and the three
gates regulate the flow of information into and out of the cell.
LSTM networks are well-suited to classifying, processing and making
predictions based on time series data, since there can be lags of
unknown duration between important events in a time series.
Existing system Disadvantages
The current work only considers the inf lu ence
from neighbors. So, to extend capture the
inf luence of the external information sources on
users opinion behaviors.
Sentence extraction dif ficult to f ind the optimal
opinions.
Redundancy occur is more.
Finding opinion target extraction and opinion
summarization is very hard.
Proposed system
Tree Convolution Neural Network (Tree CNN):
A Convolution Neural Network (CNN) is a class of artif icial neural
networks where connections between nodes form a graph along a
temporal sequence.
This allows it to exhibit temporal dynamic behavior. Derived from feed
forward neural networks, CNNs can use their internal state (memory)
to process variable length sequences of inputs.
This makes them applicable to tasks such as unsegmented, connected
handwriting recognition or speech recognition.
The term “Enhanced neural network” is used indiscriminately to refer
to two broad classes of networks with a similar general structure,
where one is f in ite impulse and the other is inf in ite impulse. Both
classes of networks exhibit temporal dynamic behavior.
A f inite impulse recurrent network is a directed acyclic graph that can
be unrolled and replaced with a strictly feed forward neural network,
while an inf inite impulse recurrent network is a directed cyclic graph
that cannot be unrolled.
Proposed system Working
In propose a novel cross-domain sentiment classif ication algorithm and
content based sentiment analysis algorithm based on term frequency, to
analyze the sentiment polarity for short texts.
It expand feature vectors based on unlabeled data from the target domain.
In this way, some important sentiment indicators for the target domain are
appended to feature vectors.
At last, validation of algorithm on one target dataset by using two typical
datasets.
The project, mainly focus on positive and negative sentiment reviews. The
f irst strategy is to identify the reviews as positive or negative by using the
positive and negative words used in the review comments.
Then expand features based on the co- occurrence frequency between a
candidate of additional related feature and a domain-independent feature.
Compared with point wise mutual information, sentiment related index
considers the distributions of word occurrences instead of the co-
occurrence frequency between dif ferent words, thus surmounting the
challenge caused by infrequent features and words.
Then calculate weightage and ranking in each opinions using content
analysis algorithm.
Proposed system advantages
Sentence extraction is not dif ficult to f in d the
optimal opinions.
Redundancy occur is less.
Finding opinion target extraction and opinion
summarization is very easy.
Hardware used
Processor: I3 Processor.
Ram: 8 GB
Hard Disk: 500 G.B Hard Disk
14 inch monitor
Software used
Operating System: Windows 7/8/10
RDBMS: SqlServer
Web Browser: Internet Explorer/Chrome
Technology: Python
Languages used: Python
Architecture Diagram
DFD Diagram
Usecase Diagram
Modules description
A. Instagram/Twitter API
User needs to register f irst by giving his/her own information. While
registering user should give their exact current location. If he/she is
giving wrong location means, he is not supposed to register and login.
And that user will be considered as a blocked user.
So user needs to give only the current location. After registration user
will login with username and password.
Then he/she can see their prof ile and can view all user tweets. Admin
needs to login with username and password. If both match, he/she will
be considered as a valid person. After login, admin can view all
blocked user who gave wrong location while registration. Admin can
able to see all users profile and tweets.
B. Post contents
In this module registered user can post tweets in insta.
If the user tries to post any tweet which contains bad words means, it
will not get posted in the instagram account. So the algorithm will
restrict the user not to post bad words.
The general posts can be posted in application and as well as in insta.
Modules description
C. Search Query
Here in this module, user can search for any query in the application. The query
has been processed and extracted live tweets from the real time twitter. The
Keywords related 100 tweets are extracted from the live twitter.
D. Preprocessing
In this step all the tweets are extracted from twitter are processed and the
noise data are removed.
1) Stop words Removal: A dictionary based approach is been utilized to remove
stop words from tweets. A generic stop word list containing 75 stop words
created using hybrid approach is used. The algorithm is implemented as below
given steps. The target text is tokenized and individual words are stored in array.
A single stop word is read from stop word list. The stop word is compared to
target text in form of array using sequential search technique. If it matches, the
word in array is removed, and the comparison is continued till length of array.
After removal of stop word completely, another stop word is read from stop
word list and again algorithm runs continuously until all the stop words are
compared.
Resultant text devoid of stop words is displayed, also required statistics like
stop word removed, no. of stop words removed from target text, total count of
words in target text, count of words in resultant text, individual stop word
count found in target text is displayed.
Modules description
2) Stemming Technique: After removing the unwanted
words from the tweet, stemming technique is processed.
Stemming is the process of reducing inf le cted (or
sometimes derived) words to their word stem, base or
root form generally a written word form. The stem need
not be identical to the morphological root of the word; it
is usually suf ficient that related words map to the same
stem, even if this stem is not in itself a valid root.
E. Classification
After stemming process, all the tweet terms containing
the keyword are classif ie d into positive, negative and
neutral tweets. CNN and SVM is used for classif ication.
Here we are having good words and bad words datasets.
By comparing with this, we can classify the posts into
positive, negative and neutral tweets.
Conclusion
The experiments conducted on the Twitter dataset
d e m o n s t r at e t h e e f fe c t i v e n e s s o f t h e t wo
proposed models.
The prediction ability of the proposed model is
further verif ie d on the opinion word prediction
task.
Based on the learned inf luence, we explore the
expression styles of users with dif ferent inf luence
powers, which provide the valuable information for
people to manage their accounts and design
marketing plans.
References
[1] Daniele Cenni, Paolo Nesi, Gianni Pantaleo, Imad Zaza., “Twitter
vigilance: A multi-user platform for cross-domain Twitter data
analytics, NLP and sentiment analysis”, In Proceedings of the IEEE
International Conference.
[2] H. Sankar, V. Subramaniyaswamy, "Investigating Sentiment
Analysis Using Machine Learning Approach", International Conference
on Intelligent Sustainable Systems (ICISS) (2017).
[3] Lavika Goel, Anurag Prakash, “Sentiment Analysis of Online
Communities Using Swarm Intelligence Algorithms”, 2016 8th
International Conference on Computational Intelligence and
Communication Networks (CICN).
[4] Lu Ma, Dan Zhang, Jian-wu Yang, Xiong Luo ., “Sentiment
Orientation Analysis Of Short Text Based On Background And Domain
Sentiment Lexicon Expansion”, 2016 5th International Conference on
Computer Science and Network Technology (ICCSNT).
[5] Shokoufeh Salem Minab, Mehrdad Jalali, Mohammad Hossein
Moattar, “Online Analysis Of Sentiment On Twitter”, 2015 International
Congress on Technology, Communication and Knowledge (ICTCK).

You might also like