DL 8
DL 8
Abstract - This project proposes a novel approach for detecting cyberbullying attacks using a LSTM-CNN architecture and
custom word embeddings trained with word2vec. The model achieved higher accuracy in classifying tweets and comments as
bullying or non-bullying based on toxicity score. The LSTM network is well-suited for processing time-series data and the CNN
layer extracts information from larger text pieces. The proposed approach overcomes the issue of ignoring sentence semantics
in existing ML and NLP models.
Keywords - Cyberbullying, Social networking sites, Deep learning, Online harassment, Cyberbullying detection, Machine
learning, Social media platforms, Cyberbullying prevention, Online safety, Natural language processing, Text classification,
Sentiment analysis, Hate speech detection, User profiling, Cyberbullying intervention, Social network analysis, Online bullying
behavior, Cyberbullying awareness.
Nandhakumar G,
Deepa S,
1. INTRODUCTION
Cyberbullying is bullying that takes place over digital devices, such as cell phones, computers, and tablets. It includes sending,
posting, or sharing negative, harmful, false, or mean content about someone else. It can also include sharing personal or private
information about someone else, causing embarrassment or humiliation. The most common places where cyberbullying occurs are
social media, text messaging, instant messaging, online forums, chat rooms, and message boards, and online gaming communities.
1.1 Purpose
A cyberbullying app aims to prevent and address cyberbullying by providing tools and resources to victims, bystanders, and bullies.
Its features may include reporting and blocking functionality, anonymous reporting, and community building. The app seeks to
empower users to take action against cyberbullying and create a supportive network for those affected.
2. PROBLEM STATEMENT
This project focuses on detecting cyberbullying on Twitter. Cyberbullying is a prevalent problem that affects mental health and
social behavior. However, manually monitoring and controlling cyberbullying on Twitter is impossible. Text classification using
supervised machine learning is commonly used to classify tweets into bullying and non-bullying tweets. Still, it has limitations in
handling tweets that change on the fly. Therefore, this project proposes a deep learning algorithm, BiLSTM, to address the
cyberbullying detection issue by capturing the semantics of the sentence
IJRTI2305103 International Journal for Research Trends and Innovation (www.ijrti.org) 654
© 2023 IJRTI | Volume 8, Issue 5 | ISSN: 2456-3315
3. PROPOSED SOLUTION
The paper proposes a model for detecting cyberbullying in text using bidirectional LSTM (BiLSTM), which is a type of recurrent
neural network that processes input in both directions to capture sequence dependencies. BiLSTM has been shown to improve
model performance on sequence classification tasks, particularly in natural language processing. The model involves duplicating
the first periodic layer in the network and providing both the input sequence and a reversed copy as inputs to two side-by-side
LSTM layers, then combining their outputs in various ways.
5. SOLUTION ARCHITECTURE
IJRTI2305103 International Journal for Research Trends and Innovation (www.ijrti.org) 655
© 2023 IJRTI | Volume 8, Issue 5 | ISSN: 2456-3315
6.TECHNICAL ARCHITECTURE
7. FEATURES
Using a deep learning model, cyberbullying in social networking sites can be analyzed and predicted. This involves collecting
diverse social media data with instances of cyberbullying, preprocessing the text, generating numerical representations, selecting
appropriate deep learning models (e.g., RNNs or CNNs), training them with optimized hyperparameters, evaluating performance
using validation data, deploying the models for predictions on new data, refining predictions with post-processing, and continuously
monitoring and refining the model's effectiveness in real-world scenarios.
Tokenization, word normalization, and lemmatization are important features for optimizing cyber bullying detection.
Tokenization breaks down the text into individual tokens, such as words or phrases, enabling further analysis. In cyber bullying,
tokenization helps identify and count specific bullying or harassing words or phrases.
Word normalization reduces different word forms to a common base form, e.g., converting "running" and "ran" to "run". This
technique groups together word variations, aiding in identifying language patterns in bullying messages.
Lemmatization determines the base form of a word based on its part of speech, e.g., converting "am" to "be" and "cats" to "cat".
Applying lemmatization reduces the number of distinct words to analyze, optimizing the detection process.
By utilizing these techniques as features, machine learning models can automatically identify cyber bullying and enhance the overall
effectiveness of detection methods.
IJRTI2305103 International Journal for Research Trends and Innovation (www.ijrti.org) 656
© 2023 IJRTI | Volume 8, Issue 5 | ISSN: 2456-3315
Deep learning models offer automated alternatives for cyberbullying analysis in social networking sites, surpassing manual
detection and intervention. They learn from extensive data, distinguishing between normal interactions and harmful behavior,
enabling swift identification and flagging of cyberbullying incidents. These models continuously adapt to evolving forms of
cyberbullying, enhancing effectiveness over time. Integrating deep learning models into social platforms fosters a safer and more
inclusive online environment, supporting prompt intervention and positive digital experiences.
8. ALGORITHM USED
8.1 BiLSTM
Bidirectional LSTM (BiLSTM) is a type of recurrent neural network used primarily in natural language processing tasks, such as
text classification and sentiment analysis. It works by processing the input sequence in both forward and backward directions,
utilizing information from both sides to better capture the sequence's sequential dependencies. BiLSTM involves duplicating the
first LSTM layer in the network so that there are two layers side-by-side, providing the input sequence as input to the first layer and
a reversed copy of the input sequence to the second layer. The outputs from both layers are then combined in several ways, such as
average, sum, multiplication, or concatenation, to produce the final output. BiLSTM has shown promising results in various NLP
tasks and is widely used in the field.
9. PERFORMANCE METRICS
Performance metrics are crucial in evaluating the effectiveness of a deep learning model for detecting cyberbullying in social
networking sites. Commonly used metrics include accuracy, precision, recall, and F1 score. Accuracy measures the overall
correctness of the model's predictions. Precision quantifies the proportion of correctly identified cyberbullying instances among all
predicted positive cases. Recall calculates the ratio of correctly identified cyberbullying instances out of all actual positive cases.
F1 score is the harmonic mean of precision and recall, providing a balanced assessment of the model's performance. These metrics
help assess the model's ability to accurately detect cyberbullying behavior and provide valuable insights for improving its
performance.
10. CONCLUSION
Deep learning models show promise in combating cyberbullying on social networking sites. They accurately detect and identify
instances of cyberbullying, enabling prompt intervention for victims. By leveraging artificial intelligence, platforms can enhance
monitoring and moderation efforts for a safer online environment. Ongoing refinement is necessary to address emerging trends and
ethical considerations. Deep learning models contribute to a more inclusive digital society.
In the future, advancements in deep learning models for addressing cyberbullying in social networking sites could focus on
improving the accuracy and efficiency of detection algorithms. This could involve developing more sophisticated models that can
better understand and interpret the context, tone, and intent behind user interactions. Additionally, integrating real-time monitoring
capabilities and incorporating user feedback loops could help refine the models and adapt to emerging trends and variations in
cyberbullying behavior. Overall, future enhancements should aim to create more robust and comprehensive solutions that can
proactively identify and mitigate cyberbullying incidents to promote a safer online environment.
IJRTI2305103 International Journal for Research Trends and Innovation (www.ijrti.org) 657
© 2023 IJRTI | Volume 8, Issue 5 | ISSN: 2456-3315
REFERENCES
1. “Accurate Cyberbullying Detection and Prevention on Social Media” by Andrea Pereraa 2021
3. “Cyberbullying Detection on Social Networks Using Machine Learning Approaches” by MdManowarul 2020
4. “A Fairness-Aware Fusion Framework for Multimodal Cyberbullying Detection“ by Jamal Alasadi 2020
5. “LSHWE:Improving Similarity-Based Word Embedding with Locality Sensitive Hashing for Cyberbullying Detection”
by QingyuXiong 2020
6. “Artificial Bee Colony–Based Feature Selection Algorithm for Cyberbullying “ by EsraSaracEssiz 2019
IJRTI2305103 International Journal for Research Trends and Innovation (www.ijrti.org) 658