0% found this document useful (0 votes)
44 views4 pages

Machine Learning-Based Strategies For Detecting Cyberbullying in Online Chats

This study employed the stacking of three machine learning techniques: Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Logistic Regression algorithms to develop a model for detecting cyberbullying using a post dataset acquired from the X Platform. The proposed model's task is to extract keywords from the post dataset and then classify them as either 1 ("cyberbullying word") or 0 ("not cyberbullying word").
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views4 pages

Machine Learning-Based Strategies For Detecting Cyberbullying in Online Chats

This study employed the stacking of three machine learning techniques: Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Logistic Regression algorithms to develop a model for detecting cyberbullying using a post dataset acquired from the X Platform. The proposed model's task is to extract keywords from the post dataset and then classify them as either 1 ("cyberbullying word") or 0 ("not cyberbullying word").
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1058

Machine Learning-Based Strategies for Detecting


Cyberbullying in Online Chats
Victor Ojodomo Akoh 1; Fati Oiza Ochepa
Department of Computer Science
Federal University Lokoja Kogi State, Nigeria

Abstract:- This study employed the stacking of three threatening, tormenting, humiliating, or intimidating emails,
machine learning techniques: Support Vector Machine as well as upload text or photographs on social media
(SVM), K-Nearest Neighbor (KNN), and Logistic platforms, with the intention of causing harm to their victim
Regression algorithms to develop a model for detecting [1].
cyberbullying using a post dataset acquired from the X
Platform. The proposed model's task is to extract Individuals have found social media platforms an easier
keywords from the post dataset and then classify them as alternative to communicate their thoughts, feelings, and
either 1 ("cyberbullying word") or 0 ("not cyberbullying emotions to their peers. The act of cyberbullying perpetrated
word"). The model generated an accuracy of 85.52%, and by a person on a social media platform can have detrimental
it was deployed using a simple Graphical User Interface effects on the victim's physical and emotional well-being; in
(GUI) web application. This study recommends that the extreme cases, it can even result in suicidal thoughts, self-
model be included on social media platforms to help harm, and loss of life. Research reveals that cyberattacks
reduce the growing use of cyberbullying phrases. primarily target teenagers and young adults. Owing to the
large number of young people who are actively using social
Keywords:- Cyberbully, Machine Learning, Detection, Social media platforms like X, cyberbullying has become a
Media. significant problem that has increasingly affected the online
community [2]. An efficient approach to tackling this
I. INTRODUCTION problem is to detect and encrypt the bullying messages prior
to their delivery to the intended recipient. The purpose of this
The Internet's global accessibility has significantly study is to enhance the current cyberbullying detection
changed our perception of the world. Social media (SM) is a system through the utilization of the stacking ensemble
derivative of the World Wide Web; social media usage is technique.
becoming increasingly popular, and it encompasses a variety
of forms, including online news forums, gaming platforms, II. RELATED WORK
and dating apps, as well as social networking sites (e.g.,
Instagram, Facebook, X, etc.). People of various ages, Research by [3] on parameterized optimization neural
origins, and social and economic classes have gradually network frames was the focus of the research work. It
incorporated social networking into their lives. Social media involved an algorithmic comparison of eleven categorization
facilitates connections between people from all around the algorithms, out of which logistic regression yielded the best
world. X, a social media platform for opinion transmission result. Bi-GRU and Bi-LSTM performed the best out of the
and image/video sharing, has surely become one of the most neural networks utilized. The researchers’ proposed shallow
popular social networking platforms, allowing users to neural network outperformed the existing state-of-the-art
upload photographs and videos for other users to view and techniques based on the accuracy and f1 score of 95% and
comment on. 98%, respectively.

Recently, there have been growing concerns about the [4] used four deep learning models convolutional neural
usage of social media platforms (especially X) to disseminate network (CNN), Long Short-Term Memory (LSTM),
opinions that may be broadly categorized as offensive. These Bidirectional Long Short-Term Memory (BLSTM), and
offensive posts may manifest as hate speech, cyberbullying, Contextual Long Short-Term Memory (CLSTM) and five
and other similar forms of content. Hate speech is speech that machine learning models (Naive Bayes, Support Vector
diminishes or disparages an individual or a collective based Machine, IBK, Logistic Regression, and JRip) for detecting
on their origin, ethnicity, sexuality, gender identity, abusive phrases in Urdu and Roman Urdu comments,
disability, religious beliefs, and political affiliation respectively. Their results revealed that CNN had optimal
[1]. Cyberbullying refers to the act of using electronic performance with accuracies of 96.2% and 91.4% for one-
communication mediums to harass and intimidate people by layer and two-layer designs, respectively. They concluded
sending them malicious messages via platforms such as based on the results that deep learning models outperform
social media, instant messaging, or digital texts. An online machine learning models and that deep learning models with
bully is an individual who uses the internet, cell phones, or one-layer designs generate more accurate results than two-
other technological devices to send harmful, shaming, layer designs.

IJISRT24JUL1058 www.ijisrt.com 2278


Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1058

The research work of [5] involved using supervised III. METHODOLOGY


learning to track out and stop cyberbullying on X platform.
The model is based on both support vector machines (SVM), This research engaged the use of a stacking classifier
Naive Bayes, and the TFI-DF vectorizer for data mining. The made up of three machine learning algorithms: Support
researchers developed a methodology to protect people from Vector Machine (SVM), Logistic Regression (LR), and K-
online media threats, and the results showed that support Nearest Neighbor (KNN).
vector machines outperformed Naive Bayes in identifying
social media bullying content. A. Stacking Classifier
The Stacking Classifier is a library in Scikit-Learn that
Data from two prominent sources of cyberbullying was combines two or more regression or classification models
utilized by [6]: hate speech posts gotten from X and personal with the aim of improving performance. The structure
assault comments originating from Wikipedia. The researcher comprises of two tiers of estimators. The first layer consists
created a model for detecting cyberbullying in text data using of all the baseline models, which predict the results of the test
natural language processing and machine learning. The dataset. The second layer consists of a meta-classifier or
researcher adopted three feature extraction methods and four regressor that generates new predictions by utilizing the
classifiers to determine the optimum technique. The predictions made by the baseline models as its input.
developed model yielded an accuracy of over 90% for posts
and 80% for Wikipedia data. B. Support Vector Machine
A support vector machine technique is employed to
Embedded sentiment and lexicon characteristics were depict different classes in a hyperplane within a
used by [7] in a supervised machine learning approach for the multidimensional space. The SVM model generates the
detection of cyberbullying on X platform and categorizing hyperplane in an iterative manner to minimize error. The
the degree of bullying into multi-class categories. Random objective of Support Vector Machines (SVM) is to partition
Forest, Support Vector Machine, Naïve Bayes, Decision Tree datasets into distinct groups by identifying a hyperplane with
and KNN were the machine learning techniques utilized in the largest margin [9].
the extraction of features. The study findings indicated that
the framework that was developed offered a feasible option C. Logistic Regression
for the identification of cyberbullying instances and assessing Logistic regression (LR) is an algorithm that uses the
its severity level in online social networks, and that after logistic function to create a distinct hyperplane between two
comparing the results obtained from testing the baseline datasets. The logistic regression algorithm employs the
feature and proposed features on the different machine attributes (inputs) in order to generate a prediction that aligns
learning techniques, the proposed features are as important in with the likelihood of a suitable class for the given input [7].
detecting cyberbullying.
D. K-Nearest Neighbor
The researchers in [8] utilized supervised machine K-Nearest Neighbor (KNN) is a fundamental technique
learning techniques to identify and address instances of employed in machine learning for the purpose of
cyberbullying in their study. Multiple classifiers were classification. Machine learning models employ various input
employed to train and detect instances of bullying behavior. variables to forecast output values. KNN is a rudimentary
The approach recommended by this study outperformed type of machine learning algorithm, mostly employed for
SVM on the cyberbullying dataset, achieving an accuracy of classification purposes. Data points are classified based on
92.8% compared to SVM's accuracy of 90.3%. Using the the classification of their nearest neighbor, using a similarity
same dataset, a neural network (NN) demonstrated superior metric with previously stored data points. [10].
performance compared to other classifiers that performed
similar tasks. Fig. 1. illustrates the division of the system
development process into smaller, interrelated sub-activities
These studies utilized a range of machine learning and to effectively accomplish the research's objective.
deep learning techniques, showcasing the effectiveness of
different models and approaches in detecting cyberbullying.
The identified research gaps include the need for ensemble
techniques combining multiple models, handling imbalanced
data, improving detection in different languages, providing
granular categorization of cyberbullying severity, comparing
effectiveness across social media platforms, enhancing real-
time detection and deployment, and utilizing comprehensive
evaluation metrics beyond accuracy and f1 scores. This
research builds on these works by proposing a stacking
ensemble technique combining SVM, KNN, and logistic
regression to improve cyberbullying detection accuracy on X
post datasets
Fig 1: System Design of Cyberbullying Detection System

IJISRT24JUL1058 www.ijisrt.com 2279


Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1058

IV. RESULTS AND DISCUSSION C. Physical Implementation/Deployment


The physical implementation of the Cyberbullying
A. Result Presentation Detection System is divided into two major parts: the user
The objective of this study is to enhance the current interface or frontend developed with HyperText Markup
cyberbullying detection model. Our study employed logistic Language (HTML), Cascading Style Sheets (CSS),
regression, KNN, SVC, and the stacking method with Bootstrap, and JavaScript; the backend developed with the
accuracy of 85.26%, 72.44%, 84.01%, and 85.52%. The Flask framework, a lightweight Python web framework; and
stacking approach is the algorithm that achieved the highest the machine learning prediction model built with Python
performance, with an accuracy rate of 85.52%. (Pandas, Numpy, Sci-kit Learn, NLTK, Seaborn, and
Matplotlib). It manages the logic and interactions between
The model was developed using the X dataset, the user interface and the machine learning model. The
comprising 16,851 posts or texts, each with annotation system is deployed as a web application, making it accessible
features indicating none, sexism, or racism, and related label via web browsers. This allows for easy integration into social
features represented by 1 or 0, denoting the presence of media platforms, providing real-time detection of
cyberbullying or its absence in the phrase, respectively. The cyberbullying. This deployment framework ensures that the
label feature is slightly imbalanced because it comprises cyberbullying detection system is robust and user-friendly,
5347 posts classified as cyberbullying and 11,501 posts making it a valuable tool for curbing cyberbullying on social
classified as non-cyberbullying. For the model building, after media platforms. Fig. 3 and Fig. 4 depicts aspects of the user
pre-processing, the dataset was partitioned into 80% training interface.
data and 20% testing data and deployed on four algorithms:
logistic regression, KNN, SVC, and the stacking approach, as
shown in Table 1. The algorithms' performance was
evaluated using accuracy and the f1-score.

B. Performance Evaluation
We assessed the effectiveness of the improved
prediction system's model using a set of criteria. Scikit-learn
accuracy and f1-score were used to assess the performance of
the machine learning algorithms. This is shown in Table 1.
And Fig. 2 respectively. Logistic Regression had an accuracy
score of 85.26%, K-Nearest Neighbor (KNN) had an
accuracy score of 72.44%, Support ector Classifier (SVC),
and the stacking method of the ensemble technique all had
accuracy scores of 84.01% up to 85.52%. The algorithms had
f1-scores of 85.26%, 75.64%, 84.89%, and 85.52%,
respectively.

Table 1: Machine Learning Algorithm Result


Model Accuracy F1
Logistic Regression 85.26% 85.26%
Fig 3: Home Section of the User Interface
KNN 72.44% 75.64%
SVC 84.01% 84.89%
Stacking Method 85.52% 85.52%

Fig 2: Accuracy of the Machine Learning Algorithms.


Fig 4: Interface for Post Input

IJISRT24JUL1058 www.ijisrt.com 2280


Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1058

V. CONCLUSION AND RECOMMENDATIONS [7]. Talpur, and D. O’Sullivan, “Cyberbullying severity


detection: A machine learning approach,” PLOS
This research successfully developed an improved ONE, vol.15, October 2020.
Cyberbullying Detection System by leveraging a stacking https://doi.org/10.1371/journal.pone.0240924
ensemble technique that combines Support Vector Machine [8]. J. Hani, M. Nashaat, M. Ahmed, Z. Emad, E. Amer,
(SVM), Logistic Regression (LR), and K-Nearest Neighbor and A. Mohammed, “Social media cyberbullying
(KNN) algorithms. The model achieved an accuracy of detection using machine learning. international journal
85.52% on an X post dataset, demonstrating its effectiveness of advanced computer science and applications,” Int.
in detecting cyberbullying. The user interface, designed with J. of Adv. Comp. Sc. and Appl., vol. 10, 2019.
HTML, CSS, Bootstrap, and JavaScript, coupled with a https://doi.org/10.14569/ijacsa.2019.0100587
Flask-based backend, provides a user-friendly and accessible [9]. Van Hee, G. Jacobs, C. Emmery, B. Desmet, E.
platform for real-time cyberbullying detection. This system Lefever, B. Verhoeven, G. De Pauw, W. Daelemans,
can be integrated into social media applications to help and V. Hoste, “Automatic detection of cyberbullying
mitigate the rising incidence of cyberbullying, making social in social media text,” PLOS ONE, vol. 13, October
media environments safer for users. Future research should 2018. https://doi.org/10.1371/journal.pone.0203794
focus on enhancing the model to detect cyberbullying in [10]. A. Kumar, “KNN Algorithm: When? Why? How? -
various languages such as pidgin English that are region- towards data science,” Medium.
specific. Research can delve into addressing data imbalance https://towardsdatascience.com/knn-algorithm-what-
issues to improve model robustness and exploring the when-why-how-41405c16c36f
effectiveness of the model across various social media
platforms. Other research areas may include developing real-
time deployment strategies to ensure seamless integration and
operation as well as utilizing comprehensive evaluation
metrics, including precision, recall, and user feedback, to
provide a more dependable assessment of model
performance.

REFERENCES

[1]. P. Ziman, C. Gaikwad, and A. Mhatre, (2021).


“Detection of cyberbullying incidents on Instagram
social network,” Intl. J. of Res. in Eng and Sci., vol. 9,
pp. 6–13, 2021.
[2]. J. Mani, and J. P. Sainudeen, “A machine learning
approach towards social media to tackle
cyberbullying,” Intl. J. of Adv. Res. Id. and Inn. in
Tech., vol. 4, pp. 495–498, 2018.
[3]. Raj, A. Agarwal, G. Bharathy, B. Narayan, and M.
Prasad, “Cyberbullying detection: hybrid models
based on machine learning and natural language
processing techniques,” Elctrncs, vol. 10, November
2021. https://doi.org/10.3390/electronics10222810
[4]. M. P. Akhter, Z. Jiangbin, I. R. Naqvi, M.
AbdelMajeed, and T. Zia, “Abusive language
detection from social media comments using
conventional machine learning and deep learning
approaches,” Mult. Sys., vol. 28, pp. 1925–1940,
April 2021. https://doi.org/10.1007/s00530-021-
00784-8
[5]. S. S. Jikriya, “Cyber bullying detection in social
media using supervised ML & NLP techniques,” Intl.
J. for Res. in App. Sc. and Eng. Tech., vol. 9, pp.
2259–2264, June 2021.
https://doi.org/10.22214/ijraset.2021.35483
[6]. S. Kangane, P. Thorat, S. Indalkar, P. Yewale, and D.
Deotale, “Detection of cyberbullying on social media
using machine learning,” Intl. J. for Res. in Appd Sc.
and Eng. Tech., vol. 9, pp.1401-1409, June 2022.
https://doi.org/10.22214/ijraset.2021.38635.

IJISRT24JUL1058 www.ijisrt.com 2281

You might also like