Automatic Hate Speech Detection Using Machine Lear
Automatic Hate Speech Detection Using Machine Lear
net/publication/344018753
Article in International Journal of Advanced Computer Science and Applications · January 2024
DOI: 10.14569/IJACSA.2020.0110861
CITATIONS READS
117 3,276
6 authors, including:
Abstract—The increasing use of social media and information within 24 hours [1]. However, the manual process to identify
sharing has given major benefits to humanity. However, this has and remove hate speech content is labor-intensive and time-
also given rise to a variety of challenges including the spreading consuming. Due to these concerns and widespread hate speech
and sharing of hate speech messages. Thus, to solve this emerging content on the internet, there is a strong motivation for
issue in social media sites, recent studies employed a variety of automatic hate speech detection.
feature engineering techniques and machine learning algorithms
to automatically detect the hate speech messages on different The automatic detection of hate speech is a challenging
datasets. However, to the best of our knowledge, there is no study task due to disagreements on different hate speech definitions.
to compare the variety of feature engineering techniques and Therefore, some content might be hateful to some individuals
machine learning algorithms to evaluate which feature and not to others, based on their concerned definitions.
engineering technique and machine learning algorithm According to [5], hate speech is:
outperform on a standard publicly available dataset. Hence, the
aim of this paper is to compare the performance of three feature ―the content that promotes violence against individuals or
engineering techniques and eight machine learning algorithms to groups based on race or ethnic origin, religion, disability,
evaluate their performance on a publicly available dataset having gender, age, veteran status, and sexual orientation/gender
three distinct classes. The experimental results showed that the identity‖.
bigram features when used with the support vector machine
Despite these different definitions, some recent studies
algorithm best performed with 79% off overall accuracy. Our
study holds practical implication and can be used as a baseline
claimed favorable results to detect automatic hate speech in
study in the area of detecting automatic hate speech messages. the text [21-32]. The proposed solutions employed the
Moreover, the output of different comparisons will be used as different feature engineering techniques and ML algorithms to
state-of-art techniques to compare future researches for existing classify content as hate speech. Regardless of this extensive
automated text classification techniques. amount of work, it remains difficult to compare the
performance of these approaches to classify hate speech
Keywords—Hate speech; online social networks; natural content. To the best of our knowledge, the existing studies
language processing; text classification; machine learning lack the comparative analysis of different feature engineering
techniques and ML algorithms.
I. INTRODUCTION
Therefore, this study contributes to solving this problem
In recent years, hate speech has been increasing in-person by comparing three feature engineering and eight ML
and online communication. The social media as well as other classifiers on standard hate speech datasets. Table I shows
online platforms are playing an extensive role in the breeding major concepts related to automatic text classification along
and spread of hateful content – eventually which leads to hate with their explanations and references. This study holds
crime. For example, according to recent surveys, the rise in practical importance and served as a reference for new
online hate speech content has resulted in hate crimes researchers in the domain of automatic hate speech detection.
including Trump's election in the US [2], the Manchester and
London attacks in the UK [3], and terror attacks in New This rest of the paper is organized as: Section II highlights
Zealand [4]. To tackle these harmful consequences of hate the related works. Section III discusses the methodology.
speech, different steps including legislation have been taken Sections IV, V, and VI explain the experimental settings,
by the European Union Commission. Recently, the European results, and discussion. Finally, Section VII discusses the
Union Commission also enforced social media networks to limitation, future work, and conclusion as well.
sign an EU hate speech code to remove hate speech content
484 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 8, 2024
485 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 8, 2024
classifier to perform experimental results. In their results, they classifiers to classify German texts hate speech messages and
achieved 87% accuracy. Irene Kwok et al. [24] employed an obtained 67% F-score. The word2Vec showed the lowest
ML-based approach to the automatic detection of racism results because such approaches need enormous data to learn
against black in the twitter community. In their research, they complex word semantics.
employed unigram with the BOW-based technique to generate
Recently, there has been a good attempt to construction
the numeric vectors. The authors fed the generated numeric
and detection of hate speech as well as offensive language in
vector to the Naïve Bayes classifier. Their experimental
other languages (i-e: Danish). An important research study
results obtained a maximum of 76% accuracy. Sanjana
[45] in 2019 worked on the construction of Danish dataset for
Sharma et al. [25] classified hate speech on twitter. In their
hate speech and offensive language detection. The dataset
research, they employed BOW features. The authors fed the
contained comments from Reddit and Facebook. It also
generated numeric vector to the Naïve Bayes classifier. Their
contained the various types and targets of the offensive
experimental results showed a maximum of 73% accuracy.
language. The authors achieved the highest F1 score of 0.74
Nevertheless, BOW showed better accuracy in social by using deep learning models with different features sets.
network text classification. However, the major disadvantage
Schmidt et al. [46] conducted a survey on hate speech
of this technique is, the word-order is ignored and causes
detection using natural language processing in 2017. The
misclassification as different words are used in different
authors discussed in detail studies regarding various feature
contexts. To overcome this limitation, researchers have
engineering techniques to be used for supervised classification
proposed an N-grams-based approach [7].
of hate speech messages. The major drawback of this survey is
Zeerak Waseem et al. [28] classify the hate speech on that there were no experimental results for those mentioned
twitter. In their research, they employed character Ngrams techniques.
feature engineering techniques to generate the numeric
Previous studies showed that a variety of researchers from
vectors. The authors fed the generated numeric vector to the
across the globe are working on hate speech recognition
LR classifier and obtained overall 73% F-score. Chikashi
written in different languages such as German, Dutch and
Nobata et al. [27] used the ML-based approach to detect the
English. However, according to our information, no study
abusive language in online user content. In their research
provides a comparative study of various features and ML
authors employed character Ngrams feature representation
algorithms on the standard dataset that can serve as a baseline
technique to represent the features. The authors fed the
study for future researchers in the field of hate speech
features to the SVM classifier. The results showed that the
recognition. Hence, in this study, we compared three feature
classifier obtained overall 77% F-score. Shervin Malmasi et al
engineering and eight ML classifiers to evaluate which one
[26] used an ML-based approach to classify hate speech in
best works on hate speech datasets (discussed in Section III).
social media. In their research, the authors employed 4grams
with character grams feature engineering techniques to III. METHODOLOGY
generate numeric features. The authors fed the generated
numeric features to the SVM classifier. The authors reported This section explains the proposed system which we have
maximum of 78% accuracy. employed to classify tweets into three different classes
namely, ―hate speech, offensive but not hate speech, and
In recent years, few researchers employed ML approaches neither hate speech nor offensive speech‖. Fig. 1 shows the
to detect automatic hate speech. For example, Karthik Dinakar complete research methodology. As shown in this figure, the
et al. [29] classified sensitive topics from social media research methodology is contained of six key steps namely,
comments or posts. In their research, they employed unigram data collection, data preprocessing, feature engineering, data
with the TFIDF feature representation technique to generate splitting, classification model construction, and classification
the numeric feature vectors. The authors fed the generated model evaluation. Each of the step is discussed in detail in the
features to four ML classifiers namely Naïve Bayes, rule- subsequent sections.
based, J48, and SVM. Their experimental results showed that
the rule-based classifier outperformed NB, J48 and SVM
classifiers by obtaining 73% accuracy. Shuhua Liu et al. [30]
performed classification on web content pages into hatred or
violence categories. In their study, they used trigram features,
represented using TFIDF. The authors used the Naïve Bayes
classifier. In their experimental settings, the Naïve Bayes
classifier obtained highest accuracy of 68%.
The N-gram-based approach gives better results than the
BOW-based approach but it has two major limitations. First,
the related words may be at a high distance in sentence and
finally increasing the N value, results in slow processing speed
[32].
In recent years, authors employed deep learning-based
NLP techniques to classify hate speech messages. Sebastian
Köffer et al. [31] employed word2vec features and SVM Fig. 1. System Overview.
486 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 8, 2024
In this research study, we collected publicly available hate Total Training Testing
Class
speech tweets dataset. This dataset is compiled and labeled by Instances instances instances
CrowdFlower. In this dataset, the tweets are labeled into three 0 Hate Speech 2399 1909 490
distinct classes, namely, hate speech, not offensive, and
1 Not offensive 7274 5815 1459
offensive but not hate speech. This dataset has 14509 number
of tweets. Of these, 16% of tweets belong to class hate speech. 2 Offensive but not Hate Speech 4836 3883 953
In addition, 50% of tweets belong to not offensive class and Total 14509 1607 2902
the remaining 33% tweets are offensive but not hate speech
class. The details of this distribution are also shown in Fig. 2. E. Machine Learning Models
B. Text Preprocessing According to ―no free lunch theorem‖ [34], there is no any
single classifier which best performs on all kinds of datasets.
Several research studies have explained that using text
Therefore, it is recommended to apply several different
preprocessing makes better classification results [33]. So, in
classifiers on a master feature vector to observe which one
our dataset, we applied different preprocessing-techniques to
reaches to the better results. Hence, we selected eight different
filter noisy and non-informative features from the tweets. In
classifiers NB [12], SVM [14], KNN [15], DT [16], RF [13],
preprocessing, we changed the tweets into lower case. Also,
AdaBoost [17], MLP [18] and LR [19].
we removed all the URLs, usernames, white spaces, hashtags,
punctuations and stop-words using pattern matching
techniques from the collected tweets. Besides this, we have F. Classifier Evaluation
also performed tokenization and stemming from preprocessed In this step, the constructed classifier predicts the class of
tweets. The tokenization, converts each single tweet into unlabeled text (i.e. ―hate speech, offensive but not hate
tokens or words, then the porter stemmer converts words to speech, neither hate speech nor offensive speech‖) using test
their root forms, such as offended to offend using porter set. The classifier performance is evaluated by calculating true
stemmer. negatives (TN), false positives (FP), false negatives (FN) and
true positives (TP). These four numbers constitute a confusion
C. Feature Engineering
matrix as in Fig. 3. Different performance metrics are used to
The ML algorithms cannot understand the classification assess the performance of the constructed classifier. Some
rules from the raw text. These algorithms need numerical common performance measures in text categorization are
features to understand classification rules. Hence, in text- discussed briefly below. The more details of performance
classification one of the key steps is feature engineering. This metrics can be found in [35].
step is used for extracting the key features from raw text and
representing the extracted features in numerical form. In this 1) Precision: Precision is also known as the positive
study, we have performed three different features engineering predicted value. It is the proportion of predictive positives
techniques, namely, n-gram with TFIDF [8], Word2vec [9] which are actually positive. Refer to ―(1)‖.
and Doc2vec [10]. ��
���������= (��+�� (1)
)
D. Data Splitting
2) Recall: It is the proportion of actual positives which are
Table II shows the class-wise distribution of the overall predicted positive. Refer to ―(2)‖.
dataset as well as data set after splitting (i.e. Training set and
��
Test set). We have used the 80-20 ratio to split the ������= (2)
preprocessed data (i.e. 80% for Training Data and 20% for (��+��)
Test Data). The training data is used to train the classification 3) F-Measure: It is the harmonic mean of precision and
model to learn classification rules. Moreover, the test data is recall (as shown in Equation 3). The standard F-measure (F1)
further used to evaluate the classification model.
gives equal importance to precision and recall. Refer to ―(3)‖.
2
�− ������� = (3)
×(���������×������)
(���������+������ )
487 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 8, 2024
IV. EXPERIMENTAL SETTINGS instances, 54 were falsely classified as not offensive and 281
As mentioned in section C, we used three types of features were falsely classified as Offensive but not Hate Speech. The
namely n-gram (bigram) with TFIDF, Word2vec and 1459 instances belong to the second class, the 1427 tweets
Doc2vec. Hence, we have a total of three different master were correctly classified as not offensive speech. The
feature representations. In addition, eight different ML remaining 32 instances were misclassified, 5 were incorrectly
algorithms were applied to the created three master feature classified as hate speech and 27 were falsely classified as an
vectors. Hence, overall 24 analyses (3 master feature vectors x offensive language but not hate speech. The remaining 953
8 ML algorithms) were evaluated to check the effectiveness of instances out of 2902 test set belonging to offensive language
classification models. but not hate speech class. Here, the SVM classifier correctly
classified the 698 tweets as an offensive language but not hate
V. RESULTS speech. The 122 and 133 instances were misclassified into
hate speech and not offensive speech, respectively.
This section explains the overall results of 24 analyses.
Tables III to Table VI shows the precision, recall, F-measure However, Fig. 5 shows the confusion matrix of the
and accuracy of all 24 analyses, respectively. The bold values Adaboost classifier using bigram with TFIDF features. As
represented are the maximum and minimum result values. All shown here, the overall performance of the Adaboost classifier
the tables are showing performance for different features is lower than the SVM classifier while using bigram with
representation and classification techniques applied in TFIDF features. The Adaboost only performed well in
experimental settings. In all 24 analyses, the lowest precision offensive language but not hate speech class.
(0.58), recall (0.57), accuracy (57%) and F-measure (0.47)
found in MLP and KNN classifier using TFIDF features
representation with bigram features. Moreover, the highest
recall (0.79), precision (0.77), accuracy (79%) and F-measure
(0.77) were obtained by SVM using TFIDF features
representation with bigram features. In feature representation,
bigram features with TFIDF obtained the best performance as
compared to Word2vec and Doc2vec. However, there was a
fringe difference between the result observed in bigram, and
Doc2vec. In text-classification models, the SVM classifier Fig. 4. Confusion Matrix (Features: Bigram (TFIDF), Classifier: SVM).
best performed among all the eight classifiers. However, the
AdaBoost and RF classifiers results were lesser than SVM
results and were better than LR, DT, NB, KNN, and MLP
results.
Furthermore, Fig. 4 and Fig. 5 show the confusion matrix
of best-performing analyses. Fig. 4 shows the SVM
classifiers’ confusion matrix using bigram with TFIDF
features. As shown here, out of 490 tweets belonging to hate
speech class, only 155 were correctly classified. However, the
335 instances were incorrectly classified. Of these 335 Fig. 5. Confusion Matrix (Features: Bigram (TFIDF), Classifier:
ADABOOST).
TABLE III. PRECISION OF ALL 24 ANALYSIS
488 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 8, 2024
489 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 8, 2024
features and classifiers performed well for two classes (i.e. [6] Mujtaba, G., et al., Prediction of cause of death from forensic autopsy
offensive but not hate speech, and neither hate speech nor reports using text classification techniques: A comparative study.
Journal of forensic and legal medicine, 2018. 57: p. 41-50.
offensive speech). Our experimental results showed that the
[7] Cavnar, W.B. and J.M. Trenkle. N-gram-based text categorization. in
24 combinations performed lowest for class hate speech. Proceedings of SDAIR-94, 3rd annual symposium on document analysis
According to Table I, the class ―Hate Speech‖ has the lowest and information retrieval. 1994. Citeseer.
training instances as compared to other classes, but the major [8] Ramos, J. Using tf-idf to determine word relevance in document queries.
reason for misclassification of class ―Hate Speech‖ (as shown in Proceedings of the first instructional conference on machine learning.
in Fig. 3 and Fig. 4) might be overlapping of different bigram 2003. Piscataway, NJ.
words with higher frequency in other classes than hate speech [9] Mikolov, T., et al. Distributed representations of words and phrases and
class. For example, bigrams like ―lame nigga, white trash, their compositionality. in Advances in neural information processing
systems. 2013.
bitch made‖ are more frequently appearing in class ―Offensive
[10] Le, Q. and T. Mikolov. Distributed representations of sentences and
but not Hate Speech‖ as compared to class ―Hate Speech‖. documents. in International conference on machine learning. 2014.
Hence, it might be possible that the classifier learned weak
[11] Kotsiantis, S.B., I.D. Zaharakis, and P.E. Pintelas, Machine learning: a
learning rules. review of classification and combining techniques. Artificial Intelligence
Review, 2006. 26(3): p. 159-190.
VII. CONCLUSION [12] Lewis, D.D. Naive (Bayes) at forty: The independence assumption in
information retrieval. in European conference on machine learning.
This study employed automated text classification 1998. Springer.
techniques to detect hate speech messages. Moreover, this [13] Xu, B., et al., An Improved Random Forest Classifier for Text
study compared three feature engineering techniques and eight Categorization. JCP, 2012. 7(12): p. 2913-2920.
ML algorithms to classify hate speech messages. The [14] Joachims, T. Text categorization with support vector machines:
experimental results exhibited that the bigram features, when Learning with many relevant features. in European conference on
machine learning. 1998. Springer.
represented through TFIDF, showed better performance as
compared to word2Vec and Doc2Vec features engineering [15] Zhang, M.-L. and Z.-H. Zhou, A k-nearest neighbor based algorithm for
multi-label classification. GrC, 2005. 5: p. 718-721.
techniques. Moreover, SVM and RF algorithms showed better
[16] Abacha, A.B., et al., Text mining for pharmacovigilance: Using machine
results compared to LR, NB, KNN, DT, AdaBoost, and MLP. learning for drug name recognition and drug–drug interaction extraction
The lowest performance was observed in KNN. The outcomes and classification. Journal of biomedical informatics, 2015. 58: p. 122-
from this research study hold practical importance because 132.
this will be used as a baseline study to compare upcoming [17] Ying, C., et al., Advance and prospects of AdaBoost algorithm. Acta
researches within different automatic text classification Automatica Sinica, 2013. 39(6): p. 745-758.
methods for automatic hate speech detection. Furthermore, [18] Gardner, M.W. and S. Dorling, Artificial neural networks (the multilayer
this study also holds a scientific value because this study perceptron)—a review of applications in the atmospheric sciences.
Atmospheric environment, 1998. 32(14-15): p. 2627-2636.
presents experimental results in form of more than one
[19] Wenando, F.A., T.B. Adji, and I. Ardiyanto, Text classification to detect
scientific measures used for automatic text classification. Our student level of understanding in prior knowledge activation process.
work has two important limitations. First, the proposed ML Advanced Science Letters, 2017. 23(3): p. 2285-2287.
model is inefficient in terms of real-time predictions accuracy [20] Burnap, P. and M.L. Williams, Us and them: identifying cyber hate on
for the data. Finally, it only classifies the hate speech message Twitter across multiple protected characteristics. EPJ Data Science,
in three different classes and is not capable enough to identify 2016. 5(1): p. 11.
the severity of the message. Hence, in the future, the objective [21] Gitari, N.D., et al., A lexicon-based approach for hate speech detection.
is to improve the proposed ML model which can be used to International Journal of Multimedia and Ubiquitous Engineering, 2015.
10(4): p. 215-230.
predict the severity of the hate speech message as well.
Moreover, to improve the proposed model’s classification [22] Tulkens, S., et al., A dictionary-based approach to racism detection in
dutch social media. arXiv preprint arXiv:1608.08738, 2016.
performance two approaches will be used. First, the lexicon-
[23] Greevy, E. and A.F. Smeaton. Classifying racist texts using a support
based techniques will be explored and assessed by comparing vector machine. in Proceedings of the 27th annual international ACM
with other current state-of-the-art results. Secondly, more data SIGIR conference on Research and development in information
instances will be collected, to be used for learning the retrieval. 2004. ACM.
classification rules efficiently. [24] Kwok, I. and Y. Wang. Locate the hate: Detecting tweets against blacks.
in Twenty-seventh AAAI conference on artificial intelligence. 2013.
REFERENCES
[25] Sharma, S., S. Agrawal, and M. Shrivastava, Degree based classification
[1] Hern, A., Facebook, YouTube, Twitter, and Microsoft sign the EU hate of harmful speech using twitter data. arXiv preprint arXiv:1806.04197,
speech code. The Guardian, 2016. 31. 2018.
[2] Rosa, J., and Y. Bonilla, Deprovincializing Trump, decolonizing [26] Malmasi, S. and M. Zampieri, Detecting hate speech in social media.
diversity, and unsettling anthropology. American Ethnologist, 2017. arXiv preprint arXiv:1712.06427, 2017.
44(2): p. 201-208.
[27] Nobata, C., et al. Abusive language detection in online user content. in
[3] Travis, A., Anti-Muslim hate crime surges after Manchester and London Proceedings of the 25th international conference on world wide web.
Bridge attacks. The Guardian, 2017. 2016. International World Wide Web Conferences Steering Committee.
[4] MacAvaney, S., et al., Hate speech detection: Challenges and solutions. [28] Waseem, Z. and D. Hovy. Hateful symbols or hateful people? predictive
PloS one, 2019. 14(8): p. e0221152. features for hate speech detection on twitter. in Proceedings of the
[5] Fortuna, P. and S. Nunes, A survey on automatic detection of hate NAACL student research workshop. 2016.
speech in text. ACM Computing Surveys (CSUR), 2018. 51(4): p. 85. [29] Dinakar, K., R. Reichart, and H. Lieberman. Modeling the detection of
textual cyberbullying. in fifth international AAAI conference on
weblogs and social media. 2011.
490 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 8, 2020
[30] Liu, S. and T. Forss. Combining N-gram based Similarity Analysis with [40] Xu, B., Y. Ye, and L. Nie. An improved random forest classifier for
Sentiment Analysis in Web Content Classification. in KDIR. 2014. image classification. in 2012 IEEE International Conference on
[31] Köffer, S., et al., Discussing the value of automatic hate speech Information and Automation. 2012. IEEE.
detection in online debates. Multikonferenz Wirtschaftsinformatik [41] Eftekhar, B., et al., Comparison of artificial neural network and logistic
(MKWI 2018): Data Driven X-Turning Data in Value, Leuphana, regression models for prediction of mortality in head trauma based on
Germany, 2018. initial clinical data. BMC medical informatics and decision making,
[32] Chen, Y., Detecting offensive language in social medias for protection 2005. 5(1): p. 3.
of adolescent online safety. 2011. [42] Dreiseitl, S., et al., A comparison of machine learning methods for the
[33] Shaikh, S. and S.M. Doudpotta, Aspects Based Opinion Mining for diagnosis of pigmented skin lesions. Journal of biomedical informatics,
Teacher and Course Evaluation. Sukkur IBA Journal of Computing and 2001. 34(1): p. 28-36.
Mathematical Sciences, 2019. 3(1): p. 34-43. [43] Singh, P.K. and M.S. Husain, Methodological study of opinion mining
[34] Ho, Y.-C. and D.L. Pepyne, Simple explanation of the no-free-lunch and sentiment analysis techniques. International Journal on Soft
theorem and its implications. Journal of optimization theory and Computing, 2014. 5(1): p. 11.
applications, 2002. 115(3): p. 549-570. [44] Bhatia, N., Survey of nearest neighbor techniques. arXiv preprint
[35] Seliya, N., T.M. Khoshgoftaar, and J. Van Hulse. A study on the arXiv:1007.0085, 2010.
relationships of classifier performance metrics. in 2009 21st IEEE [45] Sigurbergsson, G. I., & Derczynski, L. (2019). Offensive language and
international conference on tools with artificial intelligence. 2009. IEEE. hate speech detection for Danish. arXiv preprint arXiv:1908.04531.
[36] Chaudhari, U.V. and M. Picheny, Matching criteria for vocabulary- [46] Schmidt, A., & Wiegand, M. (2017, April). A survey on hate speech
independent search. IEEE Transactions on Audio, Speech, and detection using natural language processing. In Proceedings of the Fifth
Language Processing, 2012. 20(5): p. 1633-1643. International workshop on natural language processing for social
[37] Li, Y. and T. Yang, Word embedding for understanding natural media (pp. 1-10).
language: a survey, in Guide to Big Data Applications. 2018, Springer.
p. 83-104.
[38] Wang, Y., et al. Comparisons and selections of features and classifiers
for short text classification. in IOP Conference Series: Materials Science
and Engineering. 2017. IOP Publishing.
[39] Schapire, R.E., The boosting approach to machine learning: An
overview, in Nonlinear estimation and classification. 2003, Springer. p.
149-171.
491 | P a g e
www.ijacsa.thesai.org
View publication stats