Machine Learning-Based Strategies For Detecting Cyberbullying in Online Chats
Machine Learning-Based Strategies For Detecting Cyberbullying in Online Chats
Abstract:- This study employed the stacking of three threatening, tormenting, humiliating, or intimidating emails,
machine learning techniques: Support Vector Machine as well as upload text or photographs on social media
(SVM), K-Nearest Neighbor (KNN), and Logistic platforms, with the intention of causing harm to their victim
Regression algorithms to develop a model for detecting [1].
cyberbullying using a post dataset acquired from the X
Platform. The proposed model's task is to extract Individuals have found social media platforms an easier
keywords from the post dataset and then classify them as alternative to communicate their thoughts, feelings, and
either 1 ("cyberbullying word") or 0 ("not cyberbullying emotions to their peers. The act of cyberbullying perpetrated
word"). The model generated an accuracy of 85.52%, and by a person on a social media platform can have detrimental
it was deployed using a simple Graphical User Interface effects on the victim's physical and emotional well-being; in
(GUI) web application. This study recommends that the extreme cases, it can even result in suicidal thoughts, self-
model be included on social media platforms to help harm, and loss of life. Research reveals that cyberattacks
reduce the growing use of cyberbullying phrases. primarily target teenagers and young adults. Owing to the
large number of young people who are actively using social
Keywords:- Cyberbully, Machine Learning, Detection, Social media platforms like X, cyberbullying has become a
Media. significant problem that has increasingly affected the online
community [2]. An efficient approach to tackling this
I. INTRODUCTION problem is to detect and encrypt the bullying messages prior
to their delivery to the intended recipient. The purpose of this
The Internet's global accessibility has significantly study is to enhance the current cyberbullying detection
changed our perception of the world. Social media (SM) is a system through the utilization of the stacking ensemble
derivative of the World Wide Web; social media usage is technique.
becoming increasingly popular, and it encompasses a variety
of forms, including online news forums, gaming platforms, II. RELATED WORK
and dating apps, as well as social networking sites (e.g.,
Instagram, Facebook, X, etc.). People of various ages, Research by [3] on parameterized optimization neural
origins, and social and economic classes have gradually network frames was the focus of the research work. It
incorporated social networking into their lives. Social media involved an algorithmic comparison of eleven categorization
facilitates connections between people from all around the algorithms, out of which logistic regression yielded the best
world. X, a social media platform for opinion transmission result. Bi-GRU and Bi-LSTM performed the best out of the
and image/video sharing, has surely become one of the most neural networks utilized. The researchers’ proposed shallow
popular social networking platforms, allowing users to neural network outperformed the existing state-of-the-art
upload photographs and videos for other users to view and techniques based on the accuracy and f1 score of 95% and
comment on. 98%, respectively.
Recently, there have been growing concerns about the [4] used four deep learning models convolutional neural
usage of social media platforms (especially X) to disseminate network (CNN), Long Short-Term Memory (LSTM),
opinions that may be broadly categorized as offensive. These Bidirectional Long Short-Term Memory (BLSTM), and
offensive posts may manifest as hate speech, cyberbullying, Contextual Long Short-Term Memory (CLSTM) and five
and other similar forms of content. Hate speech is speech that machine learning models (Naive Bayes, Support Vector
diminishes or disparages an individual or a collective based Machine, IBK, Logistic Regression, and JRip) for detecting
on their origin, ethnicity, sexuality, gender identity, abusive phrases in Urdu and Roman Urdu comments,
disability, religious beliefs, and political affiliation respectively. Their results revealed that CNN had optimal
[1]. Cyberbullying refers to the act of using electronic performance with accuracies of 96.2% and 91.4% for one-
communication mediums to harass and intimidate people by layer and two-layer designs, respectively. They concluded
sending them malicious messages via platforms such as based on the results that deep learning models outperform
social media, instant messaging, or digital texts. An online machine learning models and that deep learning models with
bully is an individual who uses the internet, cell phones, or one-layer designs generate more accurate results than two-
other technological devices to send harmful, shaming, layer designs.
B. Performance Evaluation
We assessed the effectiveness of the improved
prediction system's model using a set of criteria. Scikit-learn
accuracy and f1-score were used to assess the performance of
the machine learning algorithms. This is shown in Table 1.
And Fig. 2 respectively. Logistic Regression had an accuracy
score of 85.26%, K-Nearest Neighbor (KNN) had an
accuracy score of 72.44%, Support ector Classifier (SVC),
and the stacking method of the ensemble technique all had
accuracy scores of 84.01% up to 85.52%. The algorithms had
f1-scores of 85.26%, 75.64%, 84.89%, and 85.52%,
respectively.
REFERENCES