CBDA Research Paper
CBDA Research Paper
1
KARAN RANA, 2 SUHAIL SAIFI, 3MOHD. ZUFAR HASAN ALVI, 4SOHAIL
1
[email protected], STUDENT CSE, SRMIST NCR CAMPUS
2
[email protected]. , STUDENT CSE, SRMIST NCR CAMPUS
3
[email protected], , STUDENT CSE, SRMIST NCR CAMPUS
4
[email protected], STUDENT CSE, SRMIST NCR CAMPUS
ABSTRACT
The proliferation of social media platforms has led to a significant rise in cyberbullying incidents,
which poses serious challenges for online safety and mental well-being. This paper presents a
comprehensive study on leveraging tweet data for automated cyberbullying detection through
advanced machine learning techniques. We propose a novel framework that employs natural
language processing (NLP) and machine learning algorithms to identify and classify
cyberbullying content within Twitter data. The framework integrates various feature extraction
methods and classification models to enhance detection accuracy. Our experimental results
demonstrate the effectiveness of the proposed approach, achieving high precision and recall
rates in distinguishing between abusive and non-abusive tweets. This research contributes to
the development of automated tools for monitoring and mitigating cyberbullying on social media
platforms, offering insights into the potential for improved online safety through technological
interventions.
a. Chat Interface
Text Input Field: A text entry field where
users can type or paste their messages
Action Buttons: Each flagged message
includes action buttons that allow
administrators to:
Review: View the full content of the
flagged message and any associated
analysis or comments.
Disable: Mark the content as
inappropriate and disable it from being
visible to users. This action helps in
managing and moderating content
effectively.
b. Review and Management
Detailed View: Administrators can click
on individual flagged messages to
access a detailed view, including the full
Fig. 2. Chatbox output
content, detection results, and any
notes or contextual information provided
Admin Page for Cyberbullying Detection by the system.
Platform Content Disabling: Administrators have
The admin page is a critical component the option to disable inappropriate
of the cyberbullying detection platform, content, which removes the flagged
designed to provide administrators with messages from user interactions and
tools to manage and monitor the prevents them from being displayed on
platform's activities. This page offers
the platform.
functionalities to review detected
c. User Management
instances of cyberbullying, disable
inappropriate content, and oversee user User Profiles: The admin page provides
interactions. The admin page enhances access to user profiles, allowing
the system's control and moderation administrators to review user activity
capabilities, ensuring a safer and more and manage user permissions. This
manageable environment. feature helps in identifying users who
may be repeatedly involved in
a. Admin Interface cyberbullying.
Dashboard Overview: The admin page Account Actions: Administrators can
includes an overview dashboard that perform actions such as suspending or
summarizes key metrics, such as the deactivating user accounts based on
number of flagged instances, active their behaviour or involvement in flagged
users, and recent activities. This content.
overview helps administrators quickly 3. Implementation
assess the platform's status. The admin page is implemented using a
Flagged Content List: A list or table combination of Tkinter for
displaying all chats or messages flagged desktop-based management and
by the system as potential instances of Streamlit for web-based visualization
cyberbullying. This list includes details and interaction:
such as the message content, user Tkinter: Provides the graphical interface
information, and the reason for flagging. for the admin page, including the list of
flagged content, action buttons, and user across Indian social media platforms, it
management tools. Tkinter’s widgets are presents unique challenges for text
used to create a functional and processing, requiring specialized
organized layout for administrators. handling during feature extraction and
Streamlit: Enhances the admin page classification.
To address this, we implement multiple
with interactive elements and real-time
machine learning algorithms to improve
updates. Streamlit’s capabilities are used the prediction accuracy of cyberbullying
to display metrics, update flagged detection. Multinomial Naive Bayes (NB),
content status, and visualize data related Decision Tree Classifier, AdaBoost
to user interactions and flagged Classifier, and Bagging Classifier are
messages. employed as core classification models.
4. Security and Access Control Each of these algorithms brings distinct
Authentication: Access to the admin advantages in terms of handling textual
page is restricted to authorized data, classification accuracy, and model
personnel only. Administrators must log robustness.
Accuracy is a critical metric in evaluating
in with special credentials to access the
the performance of machine learning
management tools and features. models, particularly in the context of
Data Security: Sensitive information classifying cyberbullying messages in
displayed on the admin page, such as Hinglish. It represents the proportion of
user data and flagged content, is correctly classified instances out of the
protected through encryption and secure total instances examined. In this study,
data handling practices. we rigorously assess the accuracy of the
various algorithms employed:
Multinomial Naive Bayes (NB), Decision
Tree Classifier, AdaBoost Classifier,
and Bagging Classifier.
Conclusion
Thus we have successfully been able to
extract the data
, clean it, and visualize it using various
python libraries. We also implemented
various natural language
processingtechniques like tokenization,
lemmatization and vectorization Fig. 6. Comparision between CV and
TF-IDF
i.e. feature extraction. After reading
various research papers published in this
field we analyzed that in feature Future Scope
extraction, count vectorizer and TF-IDF
are the two methods which are giving
While the current models Furthermore, the creation of
demonstrate promising results in annotated corpora that label
classifying cyberbullying messages, instances of cyberbullying will be
there remains considerable potential instrumental in training and
for improving accuracy through the validating the machine learning
application of deep learning models effectively.
techniques. Deep learning models, While the current model focuses on
such as Convolutional Neural classifying cyberbullying solely
Networks (CNNs) and Recurrent through textual analysis, the future
Neural Networks (RNNs), especially iterations of this research can
Long Short-Term Memory (LSTM) explore multimodal classification
networks, have shown exceptional techniques. Cyberbullying often
capabilities in natural language manifests not only in text but also in
processing tasks. These models can images and videos. Developing a
capture complex patterns and model that can classify cyberbullying
relationships in data, making them content across these different media
particularly effective for types would provide a more holistic
understanding the nuanced approach to detection. For instance,
language used in cyberbullying. By integrating image processing
employing these advanced techniques, such as CNNs for visual
architectures, we aim to achieve data, alongside text analysis could
higher classification accuracy, enhance the system's ability to
reduce misclassifications, and detect cyberbullying in scenarios
enhance the overall robustness of where images or videos convey
the system. harmful or abusive messages.
[1] B. Dean, “How many people use social media in 2021? (65+ statistics),” Sep. 2021.
[Online]. Available: https://backlinko.com/social-media-users
[3] S. M. Novianto, I. Isa, and L. Ashianti, “Cyberbullying classification using text mining,”
in Proc. 1st Int. Conf. on Informatics and Computational Sciences (ICICoS), 2017, pp.
241–246.
[4] C. Van Hee et al., “Automatic detection of cyberbullying in social media text,” PLoS
One, vol. 13, no. 10, p. e0203794, 2018.
[5] M. A. Al-Garadi et al., “Predicting cyberbullying on social media in the big data era
using machine learning algorithms: Review of literature and open challenges,” IEEE
Access, vol. 7, pp. 70 701–70 718, 2019.
[12] A. Gaydhani, V. Doma, S. Kendre, and L. Bhagwat, “Detecting hate speech and
offensive language on Twitter using machine learning: An n-gram and TF-IDF based
approach,” arXiv preprint arXiv:1809.08651, 2018.
[13] L. Ketsbaia, B. Issac, and X. Chen, “Detection of hate tweets using machine
learning and deep learning,” in Proc. 19th Int. Conf. on Trust, Security and Privacy in
Computing and Communications (TrustCom), 2020, pp. 751–758.