0% found this document useful (0 votes)
19 views8 pages

Batch-9 Paper

The document proposes a machine learning approach for detecting cyberbullying. It discusses how social media has both benefits but also fuels negative behaviors like cyberbullying. The objective is to introduce a model using machine learning that can determine if a post contains bullying. Several machine learning methods are evaluated for the model, with random forests found to outperform others but TF-IDF features more accurate than bag-of-words. The goal is to automatically identify abusive content on platforms and reduce digital bullying.

Uploaded by

Mavoori Akhil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

Batch-9 Paper

The document proposes a machine learning approach for detecting cyberbullying. It discusses how social media has both benefits but also fuels negative behaviors like cyberbullying. The objective is to introduce a model using machine learning that can determine if a post contains bullying. Several machine learning methods are evaluated for the model, with random forests found to outperform others but TF-IDF features more accurate than bag-of-words. The goal is to automatically identify abusive content on platforms and reduce digital bullying.

Uploaded by

Mavoori Akhil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

EMPOWERING ONLINE SAFETY: A MACHINE

LEARNING APPROACH TO
CYBERBULLYING DETECTION
B.V. Chowdary Mavoori Akhil Komirishetty Pavan
Associate Professor UG Scholar UG Scholar
Dept of IT Dept of IT Dept of IT
Vignan Institute of Technology and Science(A) Vignan Institute of Technology and Science(A) Vignan Institute of Technology and science(A)
Hyderabad Hyderabad Hyderabad
[email protected] [email protected] [email protected]

B. Pavana Teja Reddy V.S. Gunjan


UG Scholar UG Scholar
Dept of IT Dept of IT
Vignan Institute of Technology and Science(A) Vignan Institute of Technology and Science(A)
Hyderabad Hyderabad
[email protected] [email protected]

ABSTRACT— With the growth of the Internet, social


media use has increased significantly as time passed, hurt their reputation and hurt their feelings.
making it the most significant network platform of Cyberbullying has emerged as a significant social media
the twenty-first century. Where, increasing social concern in recent times. Cyber-harassment, often known
networks frequently has detrimental effects on society as cyberbullying, is an electronic
and fuels a few undesirable phenomena like
cyberbullying, cyber abuse, cybercrime, and online II. OBJECTIVE
trolling. Particularly for women and children,
cyberbullying frequently causes severe mental and
physical pain. In some cases, it even compels the In this regard, we Introduce a model built upon
victim to try suicide. Because of its severe detrimental machine learning towards cyberbullying identification
effects on society, online harassment garners that can determine whether an article is related to
attention. Recently, there have been numerous bullying or not. Several machine learning methods
incidents of online Bullying—including discovering have been examined for the proposed cyberbullying
private chat, giving rumors, and making sex remarks detection model, such as Naive Bayes Machine
—all across the world. As a result, there has been an Learning, vector machine learning for Support,
increase in the recognition of bullying texts or Decision Trees, and RandomForests. Datasets
messages on social media. containing posts and comments from Facebook and
Index Terms— Cyber abuse, social media, online Twitter were utilized in our research. We utilize two
harassment, Cyberbullying Texts. distinct featured vectors, BoW and TF-IDF, for
performance analysis. Result show that Randomforest
I.INTRODUCTION outperforms every other machine-learning technique,
but the TF-IDF feature leads BoW in terms of
The Internet is an environment that allows users to accuracy. By developing a prototype that can
engage with society and submit everything, including automatically identify abusive conduct on social media
lengthy documents, films, and images [1]. People use platforms and cyberbullying, the research project aims
their laptops or cell phones for access to online to reduce digital bullying and assertiveness.
communities. Facebook, Twitter, Instagram, TikTok, and
Facebook are the most popular social-media platforms. Existing System
Social media is used these days for a variety of purposes, In America, nearly fifty percent of all teens have been
including business, education, and charitable endeavors survivors of cyberbullying. The victim of harassment
[2, 3, 4]. Additionally, social media boosts the global suffers from psychological and physical effects. The
economy by generating a large number of new work trauma of cyberbullying is difficult to endure, and thus
possibilities [5]. Social media has many Pros, but it also the victims decide to commit suicide or other self-
has certain Cons. Malevolent users use online platforms destructive behaviors. Therefore, it's critical to recognize
to carry out immoral and dishonest deeds that harm other and stop cyberbullying in order to safeguard youngsters.
people. Decision tree techniques are used in the current machine
learning application for cyberbullying detection,
although this
Page 1
strategy is not particularly effective at categorizing challenge of complex correlations between various social
messages including online bullies. media elements. To address this, Cheng et al. [16]

Proposed System
The framework to identify cyberbullying is explained in
this section, with primary components, as seen in Figure
1. Natural language processing, as well or NLP for short,
is the first section, in addition, machine learning, also
referred to as ML, is the second. The initial stage
involves gathering and utilizing natural language
processing to build datasets that include bully words,
messages, etc announcements for the machine learning
techniques. After the datasets have been examined,
machine learning algorithms are trained to identify any
harassment or Cyberbullying interactions on online
platforms like YouTube and Twitter. Techniques •
Processing Natural Language: The content or posts from
the actual world include a variety of extraneous
characters. For instance, grammar or numerals have no
bearing on whether bullying is detected. The remarks
need to be fixed before the machine techniques for
learning are applied.

III. LITERATURE SURVEY


Researchers have made significant strides in the field of
cyber harassment detection using machine learning
techniques. One such approach involved a supervised
machine learning algorithm that employed a word-by-
word method to analyze sentimental and context feature
of judgments [9]. While initial attempts often resulted in
low accuracy, advancements were made by the
Massachusetts Institute of Technology through the
Ruminati project, which utilized support-vector tools to
identify bullies in Facebook comments. This approach
incorporated social parameters and achieved an accuracy
of 66% [10].

Another noteworthy method was introduced by Reynolds


et al. [11], who propose a bullying detection technique
on proximity modeling. This approach utilized decision
trees and instance-based trainers, achieving an
impressive accuracy of 78.5%. To enhance cyberbullying
detection, researchers explored the use of personality,
emotions, and sentiments as additional features [12].

Deep learning models have also been deployed to


combat cyberbullying. One such model utilized a deep
neural network to analyze real-world data, employing
transfer learning to enhance the detection process [13].
Baladitya et al. [14] introduced a deep neural network
architecture specifically designed to identify dislike
speeches. Additionally, a conventual neural network-
based model was developed to detect bullying text,
incorporating word embeddings to capture semantic
similarities [15].

In the realm of multimodal data, researchers faced the


Page 2
proposed XBully, an innovative cyberbullying seamless
identification system. XBully reformatted multimodal
social media data into a heterogeneous network,
enabling the integration of diverse attributes and
correlations. Recognizing the evolving nature of
cyberbullying, Vuong et al. [17] devised a multimodal
recognition system integrating images, videos,
comments, and social network activity. Their approach
utilized top-to-bottom attention networks to capture
session features and multimedia info effectively.

Neural networks have gained popularity in online


harassment identification, with researchers exploring
combinations of long-term and minimum memory
layers. a novel neural network model tailored for text
media cyberbullying detection. Their architecture
incorporated short-term memory layers, convolutional
layers, and stacked core layers, improving network
efficiency. Additionally, they introduced a unique
activation method called "Support Vector Machine
Activation," enhancing the system's performance.

In summary, ongoing research in cyber-harassment


detection leverages diverse machine learning
techniques, including supervised algorithms, deep
neural networks, and multimodal approaches, to combat
the multifaceted nature of online harassment. These
efforts underscore the importance of continuous
innovation to address the challenges posed by
cyberbullying effectively.

IV. ARCHITECTURE AND METHODOLOGY

A. System Architecture

It refers to the high-level design of a computer-based


system. It defines the components or modules that
constitute the system, their relationships, and how
they interact to achieve the intended functionality. A
system architecture description typically includes the
following components:
 Components: These are the building blocks of
the system. Components can be hardware
elements like servers, computers, or devices,
as well as software elements like modules,
libraries, and databases.
 Modules: Components are often divided into
smaller functional units called modules.
Modules encapsulate specific features or
operations within the system. They can be
designed to handle specific tasks, ensuring
modularity and ease of maintenance.
 Interfaces: Interfaces define how different
modules interact with each other. protocols,
and data formats used for communication.
Well- defined interfaces are essential for

Page 3
integration and interoperability between noise. These irregularities must be addressed to create a
system elements. dataset suitable for machine learning algorithms. In our
 Data Storage: System architecture describes case, we focused on obtaining relevant data metrics
how data is stored, managed, and accessed. It related to profanity in daily online comments to train our
includes databases, file systems, and data models effectively. The initial dataset was in XML
structures. Data storage mechanisms are format, which we converted to the standard CSV format
crucial for ensuring data integrity, security, and commonly used for machine learning purposes. During
efficient retrieval. preprocessing, we handled missing values, removed
 Scalability and Performance: System noise, and addressed inconsistencies in the data.
Additionally, we ensured that variables were
architecture addresses how the system can
appropriately scaled and transformed to prevent any
handle increased loads and demands. single variable from dominating the model's predictions.
Scalability features ensure that the system can These meticulous data preparation steps were crucial to
expand its capabilities as the user base or data creating a clean and reliable dataset, providing a solid
volume grows. foundation for our regression modeling efforts.
 Deployment: System architecture outlines 3) Training Phase: For training the model, first we
how the system is deployed in various import a specific algorithm class/module and create an
environments. It includes considerations for instance of it. Then using that instance, we fit the model
physical deployment (such as server to the training data. Then we validate it by testing its
locations), cloud-based deployment, and accuracy score and tuning its parameters till we get the
virtualization strategies required results.
4) Testing Phase: For testing the model, we compare its
predicted values after the training phase with test data.
Then input some different values for prediction and
check whether it predicts it right. If it didn’t predict right
then, fine-tune the algorithmic parameters and fit the
model again.

V IMPLEMENTATION

A. PyCharm IDE
The widely used Integrated Development
Environment (IDE) PyCharm was created
especially for Python development. PyCharm,
created by JetBrains, provides a robust and user-
Fig. 1. System Architecture friendly platform tailored to meet the needs of
B. Modules Python developers. It provides a comprehensive set
The development of the project is based on the of features that enhance productivity, code quality,
Dataset considered and effective tuning of and collaboration.
parameters of Machine Learning Algorithms. The The IDE gives advanced code error, smart
system consists of basically 4 phases: suggestions, allowing developers to write code
1) Data Gathering faster and with fewer mistakes. Its powerful
2) Data processing refactoring tools simplify the process of
3) Training Phase restructuring code, making it easier to maintain and
4) Testing Phase improve the quality of existing projects. PyCharm
also includes a built- in visual debugger that assists
1) Data Gathering: The dataset represented here is a in identifying and fixing bugs efficiently.
collection of tweets that were collected using Twitter PyCharm excels in supporting various, Flask, and
API. The number of data entries exceeded 1000 tweets Pyramid. It offers dedicated project templates,
which belong to different periods. The following images integrated tools for database management, and
depict the datasets indicating Text Labels.
seamless integration with popular version control
2) Data Processing: Preparing raw data for regression
modeling is a critical step, as the data obtained from systems like Git. The IDE's web development
online sources are often inconsistent, incomplete, or capabilities streamline the creation of dynamic web
contain applications and ensure smooth collaboration
Page 4
among

Page 5
team members. So that user can register with the unique information
Additionally, PyCharm promotes efficient testing
with its integrated test runner and comprehensive
testing tools. It facilitates running unit tests, and
behavioral tests and even provides support for
popular testing frameworks like pytest. The version
control features enable seamless collaboration by
allowing developers to manage and merge code
changes.
Furthermore, PyCharm enhances the development
process with its powerful tools for data science and
scientific computing. Supports the pandas, and Fig..3. Registration Status
mathplotLib enables data analysis and visualization
within the IDE. PyCharm's user-friendly interface Fig. 4. Displays the posted information of the
and integration capabilities make it a preferred members of the website and their friends
choice for Python developers, whether they are
working on web applications, data science projects,
or any other Python-based software development.
B. Python
The Python programming language is interpreted as
high- level, dynamic, cross-platform, and open source.
Python's 'philosophy' prioritizes readability, clarity, and
simplicity while optimizing the programmer's power and
expressiveness. When a Python programmer writes
elegant code, rather than just intelligent code, it is the
greatest compliment. For these reasons, Python makes an
excellent 'first language' but may also be a very potent
tool in the hands of a seasoned and ruthless coder. Fig.4. Post Page
Python is an incredibly versatile language. It is
extensively utilized for a variety of objectives. Common
applications include: Fig.5. It displays the profile of the user where he
• Writing web applications using frameworks like can update and post information
Django, Zope, and TurboGears; Using basic scripts for
systems Using GUI toolkits such as Tkinter or wxPython
(and more recently, Windows Forms and Iron Python) to
create desktop applications; developing Windows apps;

VI.RESULTS AND OUTPUT


The following screenshots are the results of the
Cyberbullying Detection on social media developed by
us Fig. 2. It is the login page of our application which is
the user login page

Fig..5. Profile Page

VII.CONCLUSION
The cyberbullying detection project stands as a
pivotal initiative in promoting online safety and
fostering a positive digital atmosphere. this project
addresses the pressing issue of cyberbullying across
Fig.2. Login Status diverse online platforms. The implementation of
Fig. 3. It is the registration Page of our application
Page 6
robust algorithms not only facilitates early K.
intervention and mental health support for victims
but also encourages responsible online behavior,
making significant strides toward creating secure
online spaces. Despite the challenges, including
privacy concerns and algorithmic biases, the
project's potential for impact is immense. As
technologies evolve, it is imperative to refine these
systems continually, ensuring they strike the right
balance between safeguarding users and preserving
freedom of expression. The project not only
contributes to immediate online safety but also
serves a foundation for ongoing research, paving an
empathetic respectful digital landscape where
individuals can engage, learn, and express
themselves without the fear of cyberbullying.

ACKNOWLEDGEMENT
First of all, we would like to extend our deepest
appreciation to Mr. B.V. Chowdary, Associate
Professor, who served as our project’s mentor. Next,
we would like to express our heartfelt gratitude to
Vignan Institute of Technology and Science,
Hyderabad, and especially the Department of
Information Technology for providing our team with
all the tools resources, help, and direction required
to finish this project.

REFERENCE
[1] Fuchs, social media: An analytical overview.
Sage (2017)
[2] N. Selwyn, "Social media in higher education,"
Erasmus World of Learning, Vol. 1, No. 3, 2012,
pp. 1–10.
[3] Antecedents of social media business-to-
business use in an industrial marketing context:
clients' perspective, H. Karafuto, P. Ulkuniwemi,
H. Keinanenq, and O. Kuivalainen, Journal of
Business & Industrial Marketing, 2015.
[4] W. Akram and R. Kumar, "A study on the
positive and negative effects of social media on
society," International Journal of Computer
Sciences and Engineering, vol. 5, no. 10, pp. 351-
354, 2017.
[5] The digital marketplace, by D. Tapscott et al.
2015 saw McGraw-Hill Education.
[6] Cyberbullying on social network sites: a pilot
investigation by S. Bastiaensens, H. Vandebosch,
Page 7
Poels, K. Van Cleemput, A. Desmet, and I. [15] D. Perito, C. Castelluccia, M. A. Kaafar, and
De Bourdeaudhuij P. Manila, “How unique and traceable are
[7] Hoff, D. L., and Mitchell, S. N., usernames?” in Proc. 11th Int. Conf. Privacy
"Cyberbullying: Causes, Effects, and Enhancing Technology., 2011, pp. 1–17
Remedies," Journal of Educational
Administration, 2009.
[8] S. Hinduja and J. W. Patchin, "Bullying,
Cyberbullying, and Suicide," Archives of
Suicide Research, vol. 14, no. 3, 2010.
[9] V. Balakrishnan, S. Khan, and H. R.
Arabnia, “Improving cyberbullying detection
using twitter users’ psychological features
and machine learning,” Computers &
Security, vol. 90, p. 101710, 2020.
[10] S. Agrawal and A. Awekar, “Deep
learning for detecting cyberbullying across
multiple social media platforms,” in European
Conference on Information Retrieval.
Springer, 2018, pp. 141–153.
[11] M. A. Al-Ajlan and M. Ykhlef, “Deep
learning algorithm for cyberbullying
detection,” International Journal of Advanced
Computer Science and Applications, vol. 9,
no. 9, 2018.
[12] K. Wang, Q. Xiong, C. Wu, M. Gao,
and Y. Yu, “Multi-modal cyberbullying
detection on social networks,” in 2020
International Joint Conference on Neural
Networks (IJCNN). IEEE, 2020, pp. 1–8
[13] T. A. Buan and R. Ramachandra,
“Automated cyberbullying detection in social
media using an svm activated stacked
convolution lstm network,” in Proceedings of
the 2020 the 4th International Conference on
Compute and Data Analysis, 2020, pp. 170–
174
[14] E. Raisi and B. Huang, “Weakly
supervised cyberbullying detection using co-
trained ensembles of embedding models,” in
2018 IEEE/ACM International Conference on
Advances in Social Networks Analysis and
Mining (ASONAM). IEEE, 2018, pp. 479–
486. [20] M. A. Al-garadi, K. D. Varathan,
and S. D. Ravana, “Cybercrime detection in
online communications: The experimental
case of cyberbullying detection in the twitter
network,” Computers in Human Behavior,
vol. 63, pp. 433– 443, 2016.
Page 8

You might also like