Cyberbullying Detection Through Sentiment Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2020 International Conference on Computational Science and Computational Intelligence (CSCI)

Cyberbullying Detection Through Sentiment Analysis

Jalal Omer Atoum


Department of Mathematics and Computer Science,
East Central University
Ada, Oklahoma
[email protected]

Abstract-In recent years with the widespread of social media As a result of such wide usage of social media among adults,
platforms across the globe especially among young people, cyberbullying or cyber aggression has become a major problem
cyberbullying and aggression have become a serious and annoying for social media users. This had lead to an increasing number
problem that communities must deal with. Such platforms provide of cyber victims who have suffered either physically,
various ways for bullies to attack and threaten others in their emotionally, mentally, and/or physically.
communities. Various techniques and methodologies have been used Cyberbullying can be defined as a type of harassment that
or proposed to combat cyberbullying through early detection and takes place online on social networks. Criminals rely on such
alerts to discover and/or protect victims from such attacks. Machine networks to collect data and information to enable them to
learning (ML) techniques have been widely used to detect some
execute their crimes, for example, by determining a vulnerable
language patterns that are exploited by bullies to attack their victims.
Also. Sentiment Analysis (SA) of social media content has become
victim [2]. Therefore, researchers have been working on finding
one of the growing areas of research in machine learning. SA some methods and techniques that would detect and prevent
provides the ability to detect cyberbullying in real-time. SA provides cyberbullying. Recently, monitoring systems of cyberbullying
the ability to detect cyberbullying in real-time. This paper proposes a have gained a considerable amount of research, their goal is to
SA model for identifying cyberbullying texts in Twitter social media. efficiently identify cyberbullying cases [3]. The major idea
Support Vector Machines (SVM) and Naïve Bayes (NB) are used in behind such systems is the extraction of some features from
this model as supervised machine learning classification tools. The social media texts then building classifier algorithms to
results of the experiments conducted on this model showed detected cyberbullying based on such extracted features. Such
encouraging outcomes when a higher n-grams language model is features could be based on users, content, emotions, and/or
applied on such texts in comparison with similar previous research. social networks. Furthermore, machine learning methods have
Also, the results showed that SVM classifiers have better been used to detect language pattern features from texts written
performance measures than NB classifiers on such tweets. by bullies.
The research in detecting cyberbullying has been mostly
Keywords — Cyberbullying, sentiment analysis, machine done either through filtration techniques or through machine
learning, social media learning techniques. Infiltration techniques, profane words or
I. INTRODUCTION idioms have to be detected from texts to identify cyberbullying
[4]. Filteration techniques usually use Machine learning
Social media has been used by almost all people especially methods to build classifiers that have the capabilities of
young adults as a major media of communication. In [1], young detecting cyberbullying using corpora of collected data from
adults were among the earliest social media adopters and social networks such as Facebook and Twitter. For instance, in
continue to use it at high levels, also, usage by older adults has [5], data were collected from Formspring then it was labeled
increased in recent years as shown in Fig. 1. using the Amazon Mechanical TURK [6]. WEKA toolkit [7]
machine learning methods were, also, employed to train and
test these classifiers. Such techniques suffer from an inability
to detect indirect language harassment [8].
Chen [9] had proposed a technique to detect offensive
language constructs from social networks through the analysis
of features that are related to the users writing styles, structures,
and certain cyberbullying contents to identify potential bullies.
The basic technique used in this study is a lexical syntactic
feature that was successfully able to detect offensive contents
from texts sent by bullies. Their results had indicated a very
high precision rate (98.24%), and recall of 94.34%.
Nandhini and Sheeba [10] had proposed a technique for
detecting cyberbullying based on an NB classifier using data
collected from MySpace. They had reported an achieved
accuracy of 91%. Romsaiyud el a. [11] had employed an
enhanced NB classifier to extract cyberbullying words and
Figure 1. Percentage of U.S. Adults Who Use Social Media Sites by Age. clustered loaded patterns. They had achieved an accuracy of

978-1-7281-7624-6/20/$31.00 ©2020 IEEE 292


DOI 10.1109/CSCI51800.2020.00056
95.79% using a corpus from Slashdot, Kongregate, and several fields, but mostly applied for image recognition and text
MySpace. classification.
In this research, we use the Sentiments Analysis (SA) Classifiers for SA are usually based on predicted classes and
method for the classification of tweets into either positive, polarity, and/or on the level of classification (sentence or
negative, or neutral concerning cyberbullying. We proposed a document). Lexicon based SA text extraction is annotated with
technique for preprocessing tweets, then we tested and trained semantic orientation polarity and strength. SA proved that light
two supervised machine learning classifiers, namely; Support stemming comes in handy for the accuracy and for the
Vector Machine (SVM) and Naïve Bayes (NB). Then we performance of classification [17].
compared our proposed technique with other similar work An automatic classifier of text documents-based NB and
presented in [12]. SVM algorithms was presented in [18], the results indicated
Section two presents the background for this research. that the SVM algorithm handled the text documents
Section three presents the proposed tweets sentiment analysis classification better than the NB algorithm. Therefore, in this
model. Section four presents the experiments and results of this research and for the SA proposed techniques, we have used the
proposed model. Finally, section five presents the conclusions two supervised ML approaches for the classifications of social
of this research. media texts, namely, Naïve Bayes (NB) and Support Vector
Machine (SVM).
II. BACKGROUND To evaluate our classifiers, several evaluation metrics could
Machine learning (ML) is a method of data analysis that be used. We have adopted the most common criteria that are
automates analytical model building. ML algorithms are often commonly used, namely; accuracy, precision, recall, F-
categorized as supervised or unsupervised. Supervised ML measure, and Receiver Operating Characteristics (ROC). Such
algorithms apply what has been learned in the past to new data criteria are defined as follows:
using labeled examples to predict future events. Starting from Accuracy = (TP + TN) / (TP + TN + FP + FN)
the analysis of a known training dataset, the learning algorithm Precision = TP / (TP + FP)
produces an inferred function to make predictions about the
output values. Unsupervised ML algorithms are used when the Recall = TP / (TP + FN)
information used to train is neither classified nor labeled.
F-measure = 2 * (Recall * Precision) / (Recall + Precision)
Unsupervised learning studies how systems can infer a function
to describe a hidden structure from unlabeled data. The problem ROC: is a plot of the TP rate against the FP rate
with unsupervised ML is that they may overlap and learn to
localize texts with minimal unsupervised algorithms. Many Where:
researchers have used supervised learning approaches on data TP (True Positive) is a hit; correctly classified as positive.
related to publicly released corpora [13].
Naïve Bayes (NB) classifiers as supervised learning models TN (True Negative) (TN) is a rejection; correctly
are a family of simple "probabilistic classifiers" based on classified as negative.
applying Bayes' theorem with strong (naïve) independence FP (False Positive) is a false alarm, falsely classified as
assumptions between the features. They are among the simplest positive.
Bayesian network models. NB often relies on the bag of words
presentation of a document, where it collects the most used FN (False Negative) is a miss, falsely classified as
words neglecting other infrequent words. The bag of words negative.
depends on the feature extraction method to provide the
classification of some data [14]. Furthermore, NB has a III. PROPOSED TWEETS SA MODEL
language modeling that divides each text as a representation of The proposed SA model analyzes, mines, and classifies
unigram, bigram, or n-gram and tests the probability of the tweets. Several preprocessing stages must be done on the
query corresponding with a specific document. collected tweets for the SA process to be more effective as
Support-Vector Machines (SVMs) are also supervised illustrated in Fig. 2. These stages are as follows:
learning models with associated learning algorithms that
A. Collecting Tweets:
analyze data used for classification and regression analysis.
Given a set of training examples, each marked as belonging to A connection to Twitter is created to collect a corpus of
one or the other of two categories, a SVM training algorithm tweets. A read-only application is built to collect written tweets
builds a model that assigns new examples to one category or from Twitter. Tweets extraction helps in extracting the
the other, making it a non-probabilistic binary linear classifier important content of a tweet (the essence). Hence, what is
(although methods such as Platt scaling exist to use SVM in a needed from a tweet is written after the hashtags, and
probabilistic classification setting) [15]. The most important subsequently extracting the feature words, words that carry a
models for SVM text classifications are Linear and Radial Basis message for the user whether it is a positive, negative, or neutral
functions. Linear classification tends to train the dataset then cyberbullying tweet. Also, tweets extraction is needed to
builds a model that assigns classes or categories [16]. It facilitate analyzing the features vector and selection process
represents the features as points in space predicted to one of the (unigrams, bigrams, trigrams, …, n-gram), and to facilitate the
assigned classes. SVM has good classification performance in classification of both training and testing sets of tweets.

293
Figure 2. Proposed Tweets Preprocessing Stages.

B. Cleaning and Annotations of Tweets: C. Normalization:


Cleaning the tweets by removing special symbols, and The normalization stage starts by removing all extra
various characters and emoticons. Those symbols and spaces. All non-standard words that have numbers and/or
characters may lead us to a different classification from what dates are identified. Such words would be mapped into
the user is intended originally in the tweet. Hence, we especially built-in vocabularies. This results in a smaller
replace special symbols, emotions, and emotional characters number of tweet vocabularies and improves the accuracy of
with their meanings. Table 1 presents some special symbols the classification task.
that we have used along with their meanings and sentiments.
Furthermore, all “http/httpsெ shortening, and special D. Tokenization:
symbols such as (*, &, $, %, -, _,><) are removed from the Tokenization is an important step in SA since it reduces
collected tweets. Then each special character is replaced the typographical variation of words. The feature extraction
with a space character. process and the bag of words require tokenization. A
The annotation process of the collected tweets was done dictionary of features is used to transform words into feature
manually. As a result of this annotation, each tweet is vectors, or feature indices; such that the index of the feature
labeled with either positive, negative, or neutral (word) in the vocabulary is linked to its frequency in the
cyberbullying. Finally, the cleaned annotated extracted whole training corpus.
tweets are stored in a database in a comma-separated values
E. Named Entity Recognition (NER):
format for further manipulation.
NER is a significant tool in natural language processing;
TABLE 1. Sample Special Symbols and Their Meanings. it allows the identification of proper nouns in an
Character/Symbol Meaning Sentiment unstructured text. NER has three categories of name entities;
ENAMEX (person, organization, and country), TIMEX
♥ Heart or Positive (date and time), and NUMEX (percentages and numbers).
love F. Removing stop words:
Some stop words can help in attaining the full meaning
of a tweet and some of them are just extra characters that
ſ Smile Positive
need to be removed. Some examples of stop words are: "a,"
"and," "but," "how," "or," and "what.", such stop words do
Sad Negative
not affect the tweets meaning and can be removed from
tweets.
٦ Snow Positive or negative
G. Stemming:
Tweets stemming is done by removing any attached
suffixes, prefixes, and/or infixes from words in tweets. A
‫ح‬ Bird or Airplane Neutral
stemmed word represents a broader concept of the original
word, also it may lead to save storage [19]. The goal of
stemming tweets is to reduce the derived or inflected words
? Question Neutral into their stems, base, or root form in order to improve SA.
Furthermore, stemming helps in putting all the variation of
a word into one bucket, effectively decreasing our entropy

294
and gives better concepts to the data. Moreover, N-gram is TABLE 2. Tweets Statistics
a traditional method that takes into consideration the Total number of Tweets 5628
occurrences of N-words in a tweet and could identify formal
expressions [20]. Hence, we have used N-gram in our SA. Number of positive (cyberbullying) Tweets 1187
In this research, we have implemented the term Number of negative (no cyberbullying) Tweets 2342
frequency using weka [21]. Term frequency assigns weights
Number of neutral Tweets 2099
for each term in a document in which it depends on the
number of occurrences of the term in a document, and it
gives more weight to those terms that appear more frequent TABLE 3. NB and SVM Measures for Different N-gram Language Models
in tweets because these terms represent words and language Measure 2 gram 3 gram 4 gram Average
patterns that are more used by the tweeters.
NB 82.35 81.7 81.1 82.025
H. Feature Selection Accuracy
SVM 91.21 91.7 92.02 91.64
Feature selection techniques have been used
successfully in SAs [22] [23]. In which Features would be NB 78.46 78.68 78.42 78.52
Precision
ranked according to some measures such that non useful or SVM 88.92 89.1 89.3 89.11
non-informative features would be removed to improve the
NB 77.31 79.4 79.71 78.81
accuracy and efficiency of the classification process. In this Recall
study, we have used the Chi-square and Information gain SVM 86.28 87.36 88.04 87.23
techniques to remove such irrelevant features. NB 77.88 79.04 79.06 78.66
F-
IV. EXPERIMENTS AND RESULTS Measure SVM 87.58 88.22 88.66 88.16
To evaluate the performance of the machine learning NB 78.61 77.9 78.03 77.9
methods used in this research; namely the Naïve Bayes (NB) ROC
SVM 88.2 88.56 89.3 88.93
and the Support Vector Machine (SVM), we have collected
a total of 5628 tweets (Positive-cyberbullying, negative-no
cyberbullying, and neutral). This set of tweets was manually
classified into 1187 cyberbullying tweets, 2342 with no 95
cyberbullying tweets and the remaining 2099 are neutral
tweets. Table 2 presents the distribution of these tweets. 90
Before conducting our experiments, the set of tweets had
gone through the various phases of cleaning, preprocessing,
Normalization Tokenization, Named Entity Recognition, 85
stemming, and features selection as has been discussed in
the previous section. Then this data set is split into a ratio of 80
(70, 30) for training and testing the NB and SVM classifiers.
Finally, cross-validation is used in which 10-fold equal- 75
sized sets are produced.
Several experiments have been conducted to compare
70
the performance of NB and SVM classifiers of the above-
collected set of tweets. In the first experiment, tweets with
2-gram, 3-gram, and 4-gram are used to evaluate the NB and 65
SVM classifiers in terms of accuracy, precision, recall, F- NB SVM NB SVM NB SVM NB SVM NB SVM
measure, and ROC. Table 3 presents the results of this Accuracy Precision Recall F-Measure ROC
experiment. Fig. 3 illustrates the averages of the measures
obtained over the different n-grams models for both NB and 2-gram 3-gram 4-gram
SVM classifiers. From Table 3 and Fig. 3 we can conclude
that SVM classifiers have achieved higher average results Figure 3. Graphical Comparisons of NB and SVM Measures
than the NB classifiers in all n-gram language models in
terms of accuracy, precision, recall, F-measure, and ROC. Another experiment was conducted to compare our
For instance, SVM classifiers achieved an average accuracy proposed classifiers to the work presented in [12] using the
value of 92.02% in the case of the 4-gram language model, two major classification techniques, namely; Naïve Byes
whereas, the NB classifiers achieved an average accuracy of (NB), and Support Vector Machine (SNM) using the same
81.1 on the same language model. Also, the 4-gram data set presented earlier. Table 4 presents the summarized
language model has outperformed all other n-grams performance measures of our proposed techniques in
language models in all measures in both SVM and NB implementing the NB and SVM classifiers in comparison
classifiers. This is because a higher n-gram leads to an with the implementation of [12]. It is very clear from Table
increase in the probability of estimation. 4 and Fig. 4, that in most measures we had obtained slightly
better results.

295
TABLE 4. Averages of NB and SVM Measures for Different N-gram have gone through several phases of cleaning, annotations,
Language Models
normalization, tokenization, named entity recognition,
Avg. Avg. Avg. Avg. Avg. removing stopped words, stemming and n-gram, and
features selection.
Accur. Recall Prec. F-Meas. ROC The results of the conducted experiments have indicated
that SVM classifiers have outperformed NB classifiers in
NB 81.71 78.8 78.52 78.65975 77.9 almost all performance measures over all language models.
Proposed Specifically, SVM classifiers have achieved an average
Work SVM 91.64 87.22 89.1 88.14997 88.93
accuracy value of 92.02%, while, the NB classifiers have
achieved an average accuracy of 81.1 on the 4-gram
Previous NB 80.9 79.1 77.04 78.05641 77.02
language model.
work
SVM 83.46 85.3 84.32 84.80716 85.71 Furthermore, more experiments have been conducted to
evaluate our proposed work to a similar work of [12]. These
experiments had also indicated that our SVM and NB
classifiers had slightly better performance measures when
95 compared to this previous work.
Finally, for direction research in cyberbullying
90 detection, we would like to explore other machine learning
techniques such as Neural Networks and deep learning, with
85 larger sets of tweets. Also, to adopt some proven methods
for an automated annotation process to handle such a large
80 set of tweets.
ACKNOWLEDGMENT
75
I would like to gratefully acknowledge the support of
East Central University-Ada, Oklahoma and the Department
70
of Mathematics and Computer Science at ECU for their
support in providing the funds for my paper registration in
65
this conference.
NB SVM NB SVM
proposed Work Previous work REFERENCES
[1] JUNE 12, 2019, PEW Research center, Internet & Technology-Social
Avg. Accuracy Avg. Recall Avg. Precision Media Fact Sheet. https://www.pewresearch.org/internet/fact-
sheet/social-media/, accessed March 28, 2020.
Avg. F-Measure Avg. ROC [2] Tavani, Herman. T., “Introduction to Cybernetics: Concepts,
Perspectives, and Methodological Frameworks”, In H. T. Tavani,
Figure 4. Averages of Graphical Comparisons of NB and SVM Measures ethics and Technology: Controversaries, questions, and Strategies for
ethical Computing, river University – Fourth Edition, Wiley, pp 1-2,
Furthermore, as shown in Fig. 4, the performance 2013.
[3] S. Salawau, Y. He, and J. Lumsden, “Approaches to Automated
measures of the our SVM classifiers have better results than Detection of Cyberbullying: A survey,” Vol. 3045, no c, pp 1-20,
the SVM classifiers of the previous work. For instance, we 2017.
have obtained an average accuracy of 91.61 in the proposed [4] Internet Monitoring and Web Filtering Solutions”, “PEARL
work in contrast of an average accuracy average of 83.44 in SOFTWARE, 2015. Online.
Avaliable:http://www.pearlsoftware.com/solutions/cyber-bullying-
the previous work. Also, the average ROC of our SVM inschools.html. [Accessed Feb 20, 2020]
classifier is 88.93 compared to 85.71 of the SVM of the [5] K. Reynolds, “Using Machine Learning to Detect Cyberbullying”,
previous work. 2012.
This is an impressive result since ROC compares the true [6] Amaon Mechanical Turk”, Aug. 15, 2014 [Online]Available:
positive and false-positive rates, which is the fraction of the http://ocs.aws.amazon.com/AWSMMechTurk/latest/AWSMechanic
al-TurkGetingStartedGuide/SvcIntro.html. Accessed July 3,2020.
sensitivity or recall in machine learning. [7] S. Garner, Weka: The Waikato Environment for Knowledge
Analysis”, New Zealand, 1995.
V. CONCLUSIONS [8] V. Nahar, X. Li and C. Pang, “An effective Approach for
In this research, we have proposed an approach to detect Cyberbullying Detection,” in Communication in Information Science
and Management Engineering, May 2013.
cyberbullying from Twitter social media platform based on [9] Chen, Y., Zhou, Y., Zhu, s. and Xu, H., “Detecting Offensive
Sentiment Analysis that employed machine learning Language in Social Media to Protect Adolescent Online Saftey”, In
techniques; namely, Naïve Bayes and Support Vector privacy, Security, Risk and Trust (PASSAT), 2012 International
Machine. The data sets used in this research is a collection Conference on Social Computing (SocialCom), pp 71-80, 2012.
[10] B. Sri Nandhinia, and J.I. Sheeba, “Online Social Network Bullying
of tweets that have been classified into positive, negative, or Detection Using Intelligence Techniques”, International Conference
neutral cyberbullying. Before training and testing such on Advanced Computing Technologies and Applications (ICACTA-
machine learning techniques, the collected set of tweets 2015), Procedia Computer Science 45 (2015) 485 – 492

296
[11] Walisa Romsaiyud, Kodchakorn na Nakornphanom, Pimpaka
Prasertslip, Piyapon Nurarak, and Pirom konglerd,” Automated
Cyberbullying Detection Using Clustering Appearance Patterns”, in
Knowledge and Smart Technology (KST), 2017 9th International
Conference on, pages 242-247, IEEE, 2017.
[12] Dipika Jiandani, Riddhi Karkera, Megha Manglani, Mohit Ahuja,
Mrs. Abha Tewari, “Comparative Analysis of Different Machine
Learning Algorithms to Detect Cyber-bullying on Facebook”,
International Journal for Research in Applied Science & Engineering
Technology (IJRASET), Volume 6 Issue IV, April 2018, pp. 2322-
2328.
[13] Cristina Bosco and Viviana Patti and Andrea Bolioli, “Developing
Corpora for Sentiment Analysis: The Case of Irony and Senti–TUT,
Proceedings of the Twenty-Fourth International Joint Conference on
Artificial Intelligence (IJCAI 2015), pp. 4158-4162.
[14] N. Friedman, D. Geiger, and M. Goldszmidt, "Bayesian Network
Classifiers", Machine learning, Vol. 29, No. 2–3, pp. 131-163, 1997.
[15] Cortes, Corinna; Vapnik, Vladimir N., "Support-Vector Networks"
(PDF). Machine Learning. 20 (3): 273–297. (1995), Cutesier
10.1.1.15.9362. doi:10.1007/BF00994018.
[16] C. Cortes, and V. Vapnik. "Support-vector networks". Machine
Learning, Vol. 20, No. 3, pp. 273–297,1995
doi:10.1007/BF00994018.
[17] Leimin Tian, Catherine Lai, and Johanna D. Moore, “Polarity and
Intensity: The Two Aspects of Sentiment Analysis”, Proceedings of
the First Grand Challenge and Workshop on Human Multimodal
Language (Challenge-HML), pages 40–47, Melbourne, Australia
July 20, 2018. 2018, Association for Computational Linguistics.
[18] Monali Bordolo, and Saroj Kr. Biswas, “Sentiment Analysis of
Product using Machine Learning Technique: A Comparison among
NB, SVM and MaxEnt”, July 2018, International Journal of Pure and
Applied Mathematics 118(18):71-83
[19] Brajendra Singh Rajput, and Nilay Khare, “A Survey of Stemming
Algorithms for Information Retrieval”, IOSR Journal of Computer
Engineering (IOSR-JCE) e-ISSN: 2278-0661, p-ISSN: 2278-8727,
Volume 17, Issue 3, Ver. VI (May – Jun. 2015), PP 76-8
[20] L. Chen, W. Wang, M. Nagaraja, S. Wang, and A. Sheth, "Beyond
Positive/Negative Classification: Automatic Extraction of Sentiment
Clues from Microblogs,", Kno.e.sis Center, Technical Report, 2011.
[21] G. Holmes, A. Donkin, and I. Witten, “WEKA: A Machine Learning
Workbench,” In Proceedings of the 1994 Second Australian and New
Zealand Conference on Intelligent Information Systems, Brisbane, 29
November-2 December 1994, 357-361.
[22] Khalifa, K., and Omar, N., “A Hybrid Method Using Lexicon-Based
Approach and Naïve Bayes Classifier for Arabic Opinion Question
Answering,” Journal of Computer Science 10 (11): 1961-1968, 2014,
ISSN: 1549-3636.
[23] Fattah MA, “A Novel Statistical Feature Selection Approach for Text
Categorization. J Inf Process Syst 13:1397–1409. (2017),
https://doi.org/10.3745/JIPS.02.0076
[24] Guyon I, Elisseeff A, “An Introduction to Variable and Feature
Selection”. J Mach Learn Res 3:1157–1182. (2003)
https://doi.org/10.1016/j.aca.2011.07.027

297

You might also like