This document discusses using deep learning techniques like CNN and LSTM for classifying sarcastic and non-sarcastic tweets. It outlines the objectives to increase accuracy and reduce time consumption for predicting hate speech. It surveys several papers on related topics, including predicting message popularity on big data and using BiLSTM for multi-label hate speech classification. The existing systems' disadvantages are high redundancy and difficulty extracting optimal opinions. The proposed system is a Tree Convolution Neural Network to address these issues.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
81 views26 pages
Sarcastic Tweet - MGR
This document discusses using deep learning techniques like CNN and LSTM for classifying sarcastic and non-sarcastic tweets. It outlines the objectives to increase accuracy and reduce time consumption for predicting hate speech. It surveys several papers on related topics, including predicting message popularity on big data and using BiLSTM for multi-label hate speech classification. The existing systems' disadvantages are high redundancy and difficulty extracting optimal opinions. The proposed system is a Tree Convolution Neural Network to address these issues.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26
Sarcastic and Non Sarcastic
Tweet Classification Using Deep Learning Abstract Nowadays, social media has become a popular channel for people to exchange opinions through the user-generated text. Exploring the mechanisms about how customers' opinions towards products are inf lu enced by friends, and further predicting their future opinions have attracted great attention from corporate administrators and researchers. Various inf lu ence models have already been proposed for the opinion prediction problem. However, they largely formulate opinions as derived sentiment categories or values but ignore the role of the content information. Besides, existing models only make use of the most recently received information without taking into consideration the long-term historical communication. To keep track of user opinion behaviors and infer user opinion inf lu ence from the historical exchanged textual information, we develop a content-based sequential opinion influence framework. Based on this framework, two opinion sentiment prediction models with alternative prediction strategies are proposed. In the experiments conducted on three Twitter datasets, the proposed models outperform other popular influence models. An interesting f inding based on a further analysis of user characteristic is that an individual's influence is correlated to her/his style of expressions. In our project we will be using CNN+LSTM as existing and Tree Convolution Neural Network (TCNN) as proposed system. Introduction to Project Classif ication of opinion is becoming very important as they play a vital role in decision making. Sentiments expressed via micro blogging sites such as Instagram, twitter, Facebook, etc. helps in understanding the mindset of people. Several models have been proposed to investigate and extract distinct opinions from users where the accuracy is important. Text mining and data mining helps to identify patterns and establish relationships to solve problems in analyzing large datasets. Tools that are used for data mining allow commercial enterprises to draw conclusion for future trends. We propose a content-based sequential opinion inf luence framework to incorporate the content information with the historical information for sentiment prediction in opinion dynamics. Introduction to Domain Deep Learning: In the statistical context, Deep Learning is defined as an application of artif icial intelligence where available information is used through algorithms to process or assist the processing of statistical data. While Deep Learning involves concepts of automation, it requires human guidance. Deep Learning involves a high level of generalization in order to get a system that performs well on yet unseen data instances. Deep learning is a relatively new discipline within Computer Science that provides a collection of data analysis techniques. Some of these techniques are based on well established statistical methods (e.g. logistic regression and principal component analysis) while many others are not. Objective Increase the ef ficiency in predicting the hatred speech in good percentage. Less time consumption. User friendly and should be applicable to all datasets. More accuracy rate. Literature Survey 1 Predicting the popularity of messages based on big data Publisher: IEEE 2021 Jun Zhou; Guiping Wu In the work, we systematically and comprehensively study three types of features: user features, text features and time features. Multiple comparison experiments are carried out on big data platform. Experimental results show that time features are the most valuable features, almost close to the ef fect of all the features, and the popularity of mes s ages is predict ed wit h a s at is fact or y accuracy. Literature Survey 2 Twitter’s Hate Speech Multi-label Classif ic ation Using Bidirectional Long Short-term Memory (BiLSTM) Method Publisher: IEEE 2021 Refa Annisatul Ilma This research was done as an attempt to take care of the dangers that could be done by hate speech. The at tempt we tried to do is using multi-label text classif ication to predict hate speech with the Bidirectional Long Short-term Memory (BiLSTM) method. This multi-label text classif ication labelled every tweet in the dataset crawled from Twitter with 12 labels about hate speech. Fro m t h i s e x p e ri me n t , we o b t ai n e d t h e b e st hyperparameter value that could achieve great performance with 82.31% accuracy, 83.41% precision, 87.28% recall, and 85.30% F1-score. Literature Survey 3 A Machine Learning Pipeline to Examine Political Bias with Congressional Speeches Publisher: IEEE 2021 Prasad Hajare; Sadia Kamal; We propose a method to exploit the features of entities on transcripts collected from political speeches in US congress to label political bias of social media posts automatically without any human intervention. With existing machine learning algorithms we achieve the highest accuracy of 70.5% and 65.1% to predict posts on Twitter and Gab data respectively. We also present a machine learning approach that combines features from cascades and text to forecast cascade’s political bias with an accuracy of about 85%. Literature Survey 4 Better Prevent than React: Deep Stratif ied Learning to Predict Hate Intensity of Twitter Reply Chains Publisher: IEEE 2021 Dhruv Sahnan; Snehil Dahiya; We propose DRAGNET, a deep stratif ie d learning framework which predicts the intensity of hatred that a root tweet can fetch through its subsequent replies. We extend the collection of social media discourse from our earlier work comprising the entire reply chains up to ∼5k root tweets catalogued into four controversial topics Similar. we notice a handful of cases where despite the root tweets being non-hateful, the succeeding replies inject an enormous amount of toxicity into the discussions. Literature Survey 5 Analyze Hate Contents on Sinhala Tweets using an Ensemble Method Publisher: IEEE 2021 Madurangi Guruge; T h i s w o r k i n v e s t i g a t e s a m e c h a n i s m fo r d e t e c t i n g h a t e co n t e n t t yp e d i n S i n h a l a language and posted on Twitter. The proposed super vised mechanism is an ensemble method that selects the most accurate result from different models. 63% of accuracy, 58% of F1 Score, 61% of Precision and 58% of Recall were achieved when predicting hate content. Existing system-CNN Convolution Neural Network: In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artif icial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics. CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually mean fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer. The "fully-connectedness" of these networks makes them prone to overf itting data. Typical ways of regularization include adding some form of magnitude measurement of weights to the loss function. CNNs take a dif fe rent approach towards regularization: they take advantage of the hierarchical pattern in data and assemble more complex patterns using smaller and simpler patterns. Therefore, on the scale of connectedness and complexity, CNNs are on the lower extreme. Existing system-LSTM Long Short Term Memory (LSTM): Long short-term memory (LSTM) is an artif icial recurrent neural network (RNN) architecture used in the field of deep learning. Unlike standard feed forward neural networks, LSTM has feedback connections. It can not only process single data points (such as images ), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition and anomaly detection in network traffic or IDSs (intrusion detection systems). A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell. LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series. Existing system Disadvantages The current work only considers the inf lu ence from neighbors. So, to extend capture the inf luence of the external information sources on users opinion behaviors. Sentence extraction dif ficult to f ind the optimal opinions. Redundancy occur is more. Finding opinion target extraction and opinion summarization is very hard. Proposed system Tree Convolution Neural Network (Tree CNN): A Convolution Neural Network (CNN) is a class of artif icial neural networks where connections between nodes form a graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feed forward neural networks, CNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term “Enhanced neural network” is used indiscriminately to refer to two broad classes of networks with a similar general structure, where one is f in ite impulse and the other is inf in ite impulse. Both classes of networks exhibit temporal dynamic behavior. A f inite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feed forward neural network, while an inf inite impulse recurrent network is a directed cyclic graph that cannot be unrolled. Proposed system Working In propose a novel cross-domain sentiment classif ication algorithm and content based sentiment analysis algorithm based on term frequency, to analyze the sentiment polarity for short texts. It expand feature vectors based on unlabeled data from the target domain. In this way, some important sentiment indicators for the target domain are appended to feature vectors. At last, validation of algorithm on one target dataset by using two typical datasets. The project, mainly focus on positive and negative sentiment reviews. The f irst strategy is to identify the reviews as positive or negative by using the positive and negative words used in the review comments. Then expand features based on the co- occurrence frequency between a candidate of additional related feature and a domain-independent feature. Compared with point wise mutual information, sentiment related index considers the distributions of word occurrences instead of the co- occurrence frequency between dif ferent words, thus surmounting the challenge caused by infrequent features and words. Then calculate weightage and ranking in each opinions using content analysis algorithm. Proposed system advantages Sentence extraction is not dif ficult to f in d the optimal opinions. Redundancy occur is less. Finding opinion target extraction and opinion summarization is very easy. Hardware used Processor: I3 Processor. Ram: 8 GB Hard Disk: 500 G.B Hard Disk 14 inch monitor Software used Operating System: Windows 7/8/10 RDBMS: SqlServer Web Browser: Internet Explorer/Chrome Technology: Python Languages used: Python Architecture Diagram DFD Diagram Usecase Diagram Modules description A. Instagram/Twitter API User needs to register f irst by giving his/her own information. While registering user should give their exact current location. If he/she is giving wrong location means, he is not supposed to register and login. And that user will be considered as a blocked user. So user needs to give only the current location. After registration user will login with username and password. Then he/she can see their prof ile and can view all user tweets. Admin needs to login with username and password. If both match, he/she will be considered as a valid person. After login, admin can view all blocked user who gave wrong location while registration. Admin can able to see all users profile and tweets. B. Post contents In this module registered user can post tweets in insta. If the user tries to post any tweet which contains bad words means, it will not get posted in the instagram account. So the algorithm will restrict the user not to post bad words. The general posts can be posted in application and as well as in insta. Modules description C. Search Query Here in this module, user can search for any query in the application. The query has been processed and extracted live tweets from the real time twitter. The Keywords related 100 tweets are extracted from the live twitter. D. Preprocessing In this step all the tweets are extracted from twitter are processed and the noise data are removed. 1) Stop words Removal: A dictionary based approach is been utilized to remove stop words from tweets. A generic stop word list containing 75 stop words created using hybrid approach is used. The algorithm is implemented as below given steps. The target text is tokenized and individual words are stored in array. A single stop word is read from stop word list. The stop word is compared to target text in form of array using sequential search technique. If it matches, the word in array is removed, and the comparison is continued till length of array. After removal of stop word completely, another stop word is read from stop word list and again algorithm runs continuously until all the stop words are compared. Resultant text devoid of stop words is displayed, also required statistics like stop word removed, no. of stop words removed from target text, total count of words in target text, count of words in resultant text, individual stop word count found in target text is displayed. Modules description 2) Stemming Technique: After removing the unwanted words from the tweet, stemming technique is processed. Stemming is the process of reducing inf le cted (or sometimes derived) words to their word stem, base or root form generally a written word form. The stem need not be identical to the morphological root of the word; it is usually suf ficient that related words map to the same stem, even if this stem is not in itself a valid root. E. Classification After stemming process, all the tweet terms containing the keyword are classif ie d into positive, negative and neutral tweets. CNN and SVM is used for classif ication. Here we are having good words and bad words datasets. By comparing with this, we can classify the posts into positive, negative and neutral tweets. Conclusion The experiments conducted on the Twitter dataset d e m o n s t r at e t h e e f fe c t i v e n e s s o f t h e t wo proposed models. The prediction ability of the proposed model is further verif ie d on the opinion word prediction task. Based on the learned inf luence, we explore the expression styles of users with dif ferent inf luence powers, which provide the valuable information for people to manage their accounts and design marketing plans. References [1] Daniele Cenni, Paolo Nesi, Gianni Pantaleo, Imad Zaza., “Twitter vigilance: A multi-user platform for cross-domain Twitter data analytics, NLP and sentiment analysis”, In Proceedings of the IEEE International Conference. [2] H. Sankar, V. Subramaniyaswamy, "Investigating Sentiment Analysis Using Machine Learning Approach", International Conference on Intelligent Sustainable Systems (ICISS) (2017). [3] Lavika Goel, Anurag Prakash, “Sentiment Analysis of Online Communities Using Swarm Intelligence Algorithms”, 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN). [4] Lu Ma, Dan Zhang, Jian-wu Yang, Xiong Luo ., “Sentiment Orientation Analysis Of Short Text Based On Background And Domain Sentiment Lexicon Expansion”, 2016 5th International Conference on Computer Science and Network Technology (ICCSNT). [5] Shokoufeh Salem Minab, Mehrdad Jalali, Mohammad Hossein Moattar, “Online Analysis Of Sentiment On Twitter”, 2015 International Congress on Technology, Communication and Knowledge (ICTCK).
(Cambridge Studies in International Relations) John A. Vasquez-The Power of Power Politics. From Classical Realism To Neotraditionalism-Cambridge University Press (1999) PDF