Machine Learning Based Cyber Bullying Detection

The document discusses machine learning approaches for cyberbullying detection. It proposes using natural language processing and classification algorithms like logistic regression, decision trees, and naive bayes to analyze text and detect abusive patterns. The system aims to accurately identify cyberbullying with high precision, recall, and specificity. Existing approaches that use support vector machines, deep learning models, and other machine learning techniques for cyberbullying detection are also reviewed.

Uploaded by

International Journal of Innovative Science and Research Technology

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views

Machine Learning Based Cyber Bullying Detection

Uploaded by

International Journal of Innovative Science and Research Technology

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Volume 8, Issue 4, April 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Machine Learning based Cyber Bullying Detection

Kundharapu Vasudeva, Bestha Raghavendra Raj Kiran, Shaik Vaseem Akram, Bandaru Vijaya Prakash
Dept. of CSE
Madanapalle Institute Of Technology and Science

Abstract:- Cyber bullying is a serious issue that affects have shown great promise in detecting cyber bullying. These
individuals of all ages, particularly children and techniques use natural language processing (NLP)
teenagers who are more vulnerable to online harassment. algorithms to analyse text messages and identify patterns of
With the growing use of social media and other online abusive and aggressive behaviour. A significant advantage of
platforms, it has become increasingly important to machine learning-based methods over traditional rule-based
develop effective methods to detect and prevent cyber systems is their ability to adjust to evolving trends and
bullying. In this project, we propose a machine learning- patterns of cyberbullying, making them more efficient.
based approach for cyber bullying detection. The
proposed system uses natural language processing (NLP) In this project, we propose a machine learning-based
techniques to analyse text messages and identify patterns approach for cyber bullying detection. We aim to develop an
of abusive and aggressive behaviour. We apply various automated system that can accurately detect and flag
classification algorithms, such as Logistic Regression, potentially abusive content on online platforms. We apply
Decision Trees Classifier and Gaussian Naïve bayes, to various classification algorithms, such as logistic regression,
train our model and evaluate its performance. We also decision trees, and gaussian naïve bayes, to train our model
explore the use of ensemble methods, such as Random and evaluate its performance. We also explore the use of
Forest classifier and adaboost classifier, to improve the ensemble methods, such as Random Forest classifier, to
accuracy of our model. We use publicly available improve the accuracy of our model.In the next section, we
datasets to test our system and compare its performance provide a brief overview of related work in cyber bullying
with other existing approaches. Our results show that the detection. We then describe our proposed machine learning-
proposed machine literacy- grounded approach can based approach in detail and discuss the datasets and
effectively identify cyber bullying with high delicacy, evaluation metrics used in our experiments. We present and
perceptivity, and particularity. This project has analyse the results of our experiments and compare our
significant implications for the development of approach's performance with existing approaches. Finally,
automated systems that can help protect individuals we conclude the project and discuss future work.
from online harassment and promote a safer and more
inclusive online environment. II. RELATED WORK

Keywords:- Cyberbullying, Harassment, Machine Learning, [1]Cyberbullying is a growing concern with the
Natural Language Processing, social media analysis, Text increased use of social media and online
classification, Logistic Regression, Decision Tree Classifier, communities.Detecting and preventing cyberbullying is
Gaussian Naïve Bayes, Ensemble Methods, Adaboost crucial in ensuring the mental and physical well-being of
classifier, Random Forest Classifier, Sentiment analysis and individuals, especially children and women. [2] To address
Behavioural analysis. this issue, various studies have proposed the use of machine
learning and natural language processing techniques to
I. INTRODUCTION automatically detect cyberbullying.

Cyber bullying is a type of online importunity that [3] In a study conducted in May 2022, the authors
involves the use of electronic communication to bully, proposed the use of Support Vector Machines (SVM) to
intimidate, or hang others. It can take various forms, such as identify cyberbullying in Twitter, and Optical Character
sending threatening messages, sharing personal information Recognition (OCR) to detect image-based cyberbullying. [4]
without consent, spreading rumours, or posting insulting They categorized existing approaches into four main classes,
comments on social media platforms. Cyber bullying can including supervised learning, lexicon-based, rule-based,
have severe consequences, including depression, anxiety, and mixed-initiative approaches.
low self-esteem, and even suicide. Thus, it's essential to
descry and help cyber bullying to ensure the safety and well- [5] Another study conducted in December 2021
being of individualities who use online platforms. highlighted the research gap in resource-poor languages
such as Roman Urdu, which is widely used in South Asian
Traditional approaches to detecting cyber bullying countries. The authors performed extensive pre-processing
involve manual monitoring of online platforms, which can on the Roman Urdu microtext, including the creation of a
be time- consuming and expensive. With the growing slang-phrase dictionary and elimination of cyberbullying
volume of online content, it is becoming increasingly domain-specific stop words.[6] They experimented with
challenging to monitor and moderate online platforms different models, including RNN-LSTM, RNN-BiLSTM,
effectively. Therefore, there is a need for automated systems and CNN models, achieving validation accuracy of up to
that can identify and flag potentially abusive content quickly 85.5%.
and accurately.In recent years, machine learning techniques

IJISRT23APR2306 www.ijisrt.com 2070

Volume 8, Issue 4, April 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[7] A study from April 2021 proposed a technique to
detect online abusive and bullying messages using natural
language processing and machine learning. The study used
Bag-of-Words (BoW) and term frequency-inverse text
frequency (TFIDF) features and evaluated the accuracy level
of four machine learning algorithms.[8] The authors
concluded that their proposed technique can accurately
detects instances of cyberbullying language on social media
platforms.

[9] Other existing systems include cyberbullying

detection using deep transfer learning, cyberbullying
identification system based on deep learning algorithms, and
social media cyberbullying detection using machine
learning. [10] While some studies have achieved promising
results, the detection of cyberbullying remains a challenging
task due to the ever-evolving nature of language and the
complexity of social dynamics on online platforms.

Overall, the proposed techniques and algorithms for

cyberbullying detection provide a foundation for ongoing
research efforts to combat cyberbullying and online
harassment effectively.

III. OTHER EXISTING SYSTEMS

 Deep Learning Algorithm for Cyberbullying Detection

 Social Media Cyberbullying Detection using Machine
Learning Fig. 2: Proposed System Model
 Cyber-Bullying Detection using Machine Learning
 Data Collection: The first step in building a machine
Algorithms
learning model for cyberbullying detection is to collect a
 Detection of Cyberbullying on social media Using dataset of text examples that have been labelled as either
Machine learning containing cyberbullying or not.This dataset can be
 Cyberbullying Detection on Social Networks Using collected manually by experts or using automated tools
Machine Learning Approaches that scan social media platforms for cyberbullying-related
 Cyberbullying Identification System Based Deep Learning posts.
Algorithms  Data Pre-processing: Once the dataset is collected, the
next step is to pre-process the text data.This includes tasks
 Cyberbullying detection using deep transfer learning.
such as tokenization, removing stop words, stemming or
lemmatization, and converting text into numerical
IV. PROPOSED SYSTEM representations such as word embeddings or Bag of
Words.
 Model Selection: After pre-processing the data, the next
step is to select an appropriate machine learning model for
the task of cyberbullying detection. As mentioned earlier,
anensemble method such as a Random Forest classifier is
used for this project.
 Model Training: The selected model is then trained on
the pre-processed data.The training process involves
feeding the model, the pre-processed data along with their
corresponding labels and updating the model parameters
to minimize the classification error.
 Model Evaluation:After training the model, it's important
to evaluate its performance on a separate dataset to
measure its accuracy, precision, recall, and F1 score.If the
model performance is not satisfactory, the model
hyperparameters can be adjusted, or a different model
Fig. 1: Architecture architecture can be selected.

IJISRT23APR2306 www.ijisrt.com 2071

Volume 8, Issue 4, April 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
V. BLOCK DIAGRAM ability to detect cyber bullying accurately. Depending on the
performance metrics, further adjustments may be needed to
optimize the model, such as using a different algorithm or
tweaking the hyper parameters.

Fig. 3: Block Diagram

B. Naive Bayes classifiers:
VI. METHODOLOGY AND ALGORITHMS Naïve Bayes algorithm is a supervised learning
algorithm, which is based on Bayes theorem and used for
A. Logistic regression: solving classification problems. The performance of a
Logistic Regression is a popular machine learning Gaussian Naive Bayes classifier for detecting cyberbullying.
algorithm that belongs to the supervised learning technique. Here is an overview of what this code does:A Gaussian
It can be used performing text classification using the TF- Naive Bayes classifier (gnb) is instantiated.The random
IDF method and random oversampling. Here is an overview oversampled data (X_over, y_over) is used to train the
of what this code does: classifier using the fit () method.The classifier is used to
predict the target variable (y_pred) for the testing data
A logistic regression classifier (lgr) is instantiated.The (X_test) using the predict () method.
random oversampled data (X_over, y_over) is used to train
the classifier using the fit () method.The classifier is used to The score of the classifier is calculated using the score
predict the target variable (y_pred) for the testing data () method from the Gaussian Naive Bayes model.The
(X_test) using the predict () method.The accuracy of the confusion matrix of the classifier's predictions is calculated
classifier is calculated using the accuracy_score () method using the confusion_matrix () method from the Scikit-learn
from the Scikit-learn metrics module.The confusion matrix metrics module.The getStatsFromModel () function is called
of the classifier's predictions is calculated using the to calculate and print additional evaluation metrics such as
confusion_matrix () method from the Scikit-learn metrics precision, recall, and F1-score.
module.The getStatsFromModel () function is called to
calculate and print additional evaluation metrics such as The performance of a Gaussian Naive Bayes classifier
precision, recall, and F1-score. that was trained using the random oversampled data. The
score indicates the percentage of correctly classified
The performance of a logistic regression classifier that instances in the testing data. The confusion matrix provides
was trained using the random oversampled data. The a detailed breakdown of the true positive, true negative, false
confusion matrix provides a detailed breakdown of the true positive, and false negative predictions, while the additional
positive, true negative, false positive, and false negative evaluation metrics provide more insights into the classifier's
predictions, while the additional evaluation metrics provide performance, including its ability to detect cyberbullying
more insights into the classifier's performance, including its accurately.

IJISRT23APR2306 www.ijisrt.com 2072

Volume 8, Issue 4, April 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The effectiveness of the Gaussian Naive Bayes reported to provide a more detailed analysis of the
classifier in detecting cyberbullying can be evaluated based classifier's performance. These metrics can help in
on these performance metrics. Depending on the identifying which class is being misclassified more often and
performance metrics, further adjustments may be needed to can provide insights into areas where the model can be
optimize the model, such as using a different algorithm or improved. The decision tree classifier can be evaluated
tweaking the hyperparameters. based on these performance metrics to determine its
effectiveness in detecting cyberbullying. Depending on the
performance metrics, further adjustments may be needed to
optimize the model, such as using a different algorithm or
tweaking the hyperparameters.

C. Decision Trees Classifier:

The performance of a decision tree classifier for D. Random Forest:
detecting cyberbullying. Here is an overview that can be The performance of a random forest classifier for
included in a report: detecting cyberbullying. Here is an overview of what this
code does:A random forest classifier (rfc) is instantiated,
The decision tree classifier (dtc) is instantiated. with verbose set to True to show the training progress.The
random oversampled data (X_over, y_over) is used to train
The random oversampled data (X_over, y_over) is the classifier using the fit() method.The classifier is used to
used to train the classifier using the fit () method.The predict the target variable (y_pred) for the testing data
classifier is used to predict the target variable (y_pred) for (X_test) using the predict () method.
the testing data (X_test) using the predict () method.The
accuracy of the classifier is calculated using the The score of the classifier is calculated using the score
accuracy_score () method from the Scikit-learn metrics () method from the random forest model.
module. The confusion matrix of the classifier's predictions
is calculated using the confusion_matrix () method from the The confusion matrix of the classifier's predictions is
Scikit-learn metrics module. The get Stats From Model () calculated using the confusion_matrix () method from the
function is called to calculate and print additional evaluation Scikit-learn metrics module.The getStatsFromModel ()
metrics such as precision, recall, and F1-score. function is called to calculate and print additional evaluation
metrics such as precision, recall, and F1-score.
The accuracy score and the confusion matrix can be
presented to provide an overview of the classifier's The performance of a random forest classifier that was
performance. The accuracy score indicates the percentage of trained using the random oversampled data. The score
correctly classified instances in the testing data. The indicates the percentage of correctly classified instances in
confusion matrix provides a detailed breakdown of the true the testing data. The confusion matrix provides a detailed
positive, true negative, false positive, and false negative breakdown of the true positive, true negative, false positive,
predictions, which can help in understanding the classifier's and false negative predictions, while the additional
strengths and weaknesses. Additionally, the evaluation evaluation metrics provide more insights into the classifier's
metrics such as precision, recall, and F1-score can be performance, including its ability to detect cyberbullying

IJISRT23APR2306 www.ijisrt.com 2073

Volume 8, Issue 4, April 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
accurately.The effectiveness of the random forest classifier REFERENCES
in detecting cyberbullying can be evaluated based on these
performance metrics. Depending on the performance [1.] Miss. Jafri S A., Miss Wagh Roshani B., Miss
metrics, further adjustments may be needed to optimize the Gaikwad V. Subodh., Miss Sonawane U D,
model, such as using a different algorithm or tweaking the International Research Journal of Modernization in
hyperparameters. Engineering Technology and Science
[2.] https://www.irjmets.com/uploadedfiles/paper/issue_5
_may_2022/24749/final/fin_irjmets1653789970.pdf
[3.] Amirita Dewani, Mohsin A M., Sania B., “Journal of
Big Data ” , Cyberbullying detection: advanced pre-
processing techniques & deep learning architecture
for Roman Urdu data- Dec 2021.

[4.] https://journalofbigdata.springeropen.com/articles/10.
1186/s40537-021-00550-7

[5.] Md Manowarul., Md Ashraf., Linta., Arnisha.,

Selina., Uzzal., ‘‘Cyberbullying Detection on Social
Networks Using Machine Learning Approaches”,
2020 IEEE Asia-Pacific Conference on Computer
Science and Data Engineering (CSDE).

[6.] https://ieeexplore.ieee.org/document/9411601/authors
#authors

[7.] Akhter A., Islam L., Uddin A Md., Islam M.,

“Cyberbullying Detection on Social Networks Using
Machine Learning Approaches” Apr -2021

[8.] https://www.researchgate.net/publication/351131976_
CSyberbullying_Detection_on_Social_Networks_Usi
ng_Machine_Learning_Approaches

[9.] Giovanni B., Chanhee Shin, Nishal K., “.

Cyberbullying Detection System (JUN 2020)”

[10.] https://engineering.ucdenver.edu/docs/librariesprovid
VII. RESULT er29/college-of-engineering-and-applied-
Based on the all algorithms used in this project, science/sp2020-capstone/csci14-
report.pdf?sfvrsn=d3731fb9_2
Random Forest Algorithm give more accuracy, more
precision and support. So, we used Random Forest for the [11.] Monirah AAA., Mourad Y., “International Journal of
predict of Cyberbully. Advanced Computer Science and Applications
(IJACSA)”, 2018.
Table 1: Algorithm Result
S.No. Algorithm Name Accuracy [12.] John H., Mohamed N., Mostafa A., Zeyad E., Eslam
1 Logistic Regression 81 A., Ammar M., “International Journal of Advanced
2 Decision Tree 84 Computer Science and Applications (IJACSA)”,
3 Gaussian Naïve Bayes 62 2019.
4 Random Forest Classifier 92
[13.] Mangala K., Anvitha K., Deepa, Deepika K V., Divya
VIII. CONCLUSION AND FUTURE SCOPE C H, “Cyber-Bullying Detection using Machine
Learning Algorithms “, “IJCRT”.
The research paper compares various supervised
machine learning algorithms and ensemble methods for [14.] https://jpinfotech.org/detection-of-cyberbullying-on-
detecting cyberbullying. According to the study, the Random social-media-using-machine-learning/
Forest classifier performed the best with a 92% accuracy [15.] https://www.irjet.net/archives/V9/i5/IRJET-
rate while Naive Bayes was the least accurate with only a V9I5562.pdf
61% accuracy rate. The future scope of the project is, to
implement in real time and collaboration with companies. [16.] https://www.mdpi.com/2079-9292/11/20/3273/pdf