0% found this document useful (0 votes)
11 views21 pages

Icsdp Review

The document outlines a project presented at the 4th International Conference on Signal and Data Processing, focusing on developing a real-time multilingual offensive language detection system to enhance safety in digital spaces. The system utilizes advanced machine learning and natural language processing to identify and filter objectionable content across various languages, addressing limitations in existing systems. Future research aims to improve detection accuracy in underrepresented languages and minimize bias in the model's performance.

Uploaded by

9921004111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views21 pages

Icsdp Review

The document outlines a project presented at the 4th International Conference on Signal and Data Processing, focusing on developing a real-time multilingual offensive language detection system to enhance safety in digital spaces. The system utilizes advanced machine learning and natural language processing to identify and filter objectionable content across various languages, addressing limitations in existing systems. Future research aims to improve detection accuracy in underrepresented languages and minimize bias in the model's performance.

Uploaded by

9921004111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Lecture Notes in Electrical Engineering

4th International Conference on


Signal and Data Processing [ICSDP]
VIT BHOPAL UNIVERSITY

21-22 November 2024


Organized by School of Electrical and Electronics Engineering

Paper ID:67
Track No:08
Title: Real-Time Multilingual Offensive Words
Detection: Enhancing Safety in Global Digital
Spaces
S.No Name(s) of the Affiliation
Authors
1. Annagiri Abhiram Kalasalingam Academy Of Research And Education
VIT BHOPAL UNIVERSITY

Anand Nagar, Krishnan Koil, Srivilliputtur(Via),


Virudhunagar(Dt), Tamil Nadu.

2. Bonela Rakesh Kalasalingam Academy Of Research And Education


Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu.

3. B Surya Nagendra Kalasalingam Academy Of Research And Education


Prathap Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu

4. Chitturi Jaswanth Kalasalingam Academy Of Research And Education


Kumar Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu.
S.No Name(s) of the Affiliation
VIT BHOPAL UNIVERSITY

Authors
5. G. Nagarajan Kalasalingam Academy Of Research And Education
Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu.

6. R. Durga Meena Kalasalingam Academy Of Research And Education


Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu.

12/11/2014
Outline
VIT BHOPAL UNIVERSITY

The project aims to create a real-time system using advanced machine


learning and natural language processing to identify and filter
objectionable content across multiple languages. It focuses on
enhancing user experience by offering a context-sensitive,
multilingual solution with superior speed, accuracy, and coverage.
Trained on diverse datasets, the system demonstrates excellent
scalability and effectiveness through rigorous testing. This innovative
approach promotes a safer and more inclusive internet for global
users.
Problem Statement
VIT BHOPAL UNIVERSITY

The surge in offensive language on social


media platforms has become a pressing
concern, adversely impacting user
experience, fostering toxicity, and posing
potential threats to individuals and
communities. Developing an effective and
efficient system to detect and mitigate
offensive language in real-time automatically
is essential to create a safer and more
positive online environment. This project
aims to address this challenge by designing a
robust solution to identify and manage
offensive content on social media platforms,
thus promoting a more inclusive and
respectful online discourse.
Objective
VIT BHOPAL UNIVERSITY

The goal is to create a tool to efficiently and accurately identify and


eliminate offensive language, enhancing online user experience by
promoting a more respectful and inclusive digital environment
Limitations in Existing Systems
➢ Limited Language Coverage: Most hate speech detection
VIT BHOPAL UNIVERSITY

systems are built for English and a few main languages,


excluding numerous regional and minority languages. Because
of this, there are large blind areas where dangerous information
can proliferate unchecked.
➢ Inconsistent Accuracy Across Languages: It can be challenging
to maintain consistent accuracy across different languages since
hate speech detection systems frequently perform well in one
language but badly in others due to variations in grammar,
slang, and syntax.
Limitations in Existing Systems
VIT BHOPAL UNIVERSITY

➢ Cultural and Contextual Variations: Depending on the cultural


setting, hate speech can take many different forms. Due to their
inability to comprehend local context or cultural quirks.
➢ Bias in Training Data: Biased datasets that overrepresent
particular languages or cultures are commonly used to train hate
speech detection programs. As a result, the algorithm may
overlook hate speech in one language while responding to
harmless information in another, producing biased results.
VIT BHOPAL UNIVERSITY

Proposed Methodology
Proposed Methodology
VIT BHOPAL UNIVERSITY

➢ Data Preparation: Load bad words list and Twitter dataset.


➢ Text Preprocessing: Clean and vectorize the text.
➢ Model Training: Train and save the Decision Tree classifier.
➢ User Input: The user enters text and selects language.
➢ Translation: Translate text to English (if needed).
➢ Prediction: Classify text, and replace offensive words with asterisks.
➢ Re-translation: Translate filtered text back to the original language.
➢ Output: Display the results to the user via the web interface.
Algorithms Used
VIT BHOPAL UNIVERSITY

➢ Regular Expressions for text cleaning and Snowball Stemmer


for word stemming.
➢ Count Vectorizer to convert text into numerical features.
➢ Decision Tree Classifier for classifying text as hate speech,
offensive, or non-offensive.
➢ Google Translate API for handling multilingual translation of
input text.
Implementation
VIT BHOPAL UNIVERSITY

The implementation begins with data preparation, compiling a


multilingual dataset and preprocessing text to remove unwanted
elements and convert it into numerical form. A Decision Tree
Classifier is trained to detect offensive content. User inputs are
translated to English for analysis, with offensive words masked and
retranslations ensuring context preservation. The processed text is

displayed in real-time, promoting safe communication.


VIT BHOPAL UNIVERSITY

12/11/2014
VIT BHOPAL UNIVERSITY

12/11/2014
VIT BHOPAL UNIVERSITY

12/11/2014
Conclusion
VIT BHOPAL UNIVERSITY

Developing a real-time multilingual offensive language detection


system addresses toxic content on social media, enhancing user
safety and inclusivity. Leveraging advanced machine learning, it
efficiently identifies and filters harmful language across various
languages, promoting respectful online interactions. The system
ensures real-time moderation without disrupting communication,
making it scalable and user-friendly for global platforms. By
reducing cyberbullying and hate speech, it fosters a positive digital
environment and empowers users for constructive engagement.
This initiative sets a foundation for AI-driven content management,
creating safer and more inclusive online communities.
Future Scope
VIT BHOPAL UNIVERSITY

➢ While the model does well in English and Spanish, it has to be


improved in Bengali, Arabic, and French, where it performs
less accurately.
➢ To improve detection accuracy, future research should take into
account cultural variations in offensive language.
➢ Google Translate helps support several languages, however, for
better outcomes in low-performing languages, translation
mistakes should be kept to a minimum.
Future Scope
VIT BHOPAL UNIVERSITY

➢ Future studies should concentrate on minimizing bias and


guaranteeing consistent outcomes because the model's
performance varies depending on the language.
➢ Further data collection for underrepresented languages, such as
Arabic and Bengali, will increase the accuracy of detection in
subsequent cycles.
References
VIT BHOPAL UNIVERSITY

[1] Patel, R., & Singh, P. (2020). The role of digital platforms in
shaping global communication. International Journal of Media Studies,
12(2), 67–85.
[2] Gómez, C., & Hernández, L. (2019). Multilingualism in online
communities: Challenges and opportunities for content moderation.
Journal of Multicultural Digital Discourse, 10(3), 156–172.
[3] Nguyen, M., & Lee, J. (2020). The limitations of traditional content
moderation: A linguistic perspective. Journal of Digital Ethics, 9(4),
104– 119.
References
VIT BHOPAL UNIVERSITY

[4] Brown, T., & Wilson, K. (2021). Evaluating the efficacy of


keyword-based content filtering systems. Journal of Computational
Moderation, 5(1), 25–36.
[5] Park, Y., & Kim, D. (2022). Machine learning for real- time
offensive language detection: Challenges in multilingual settings.
Journal of Artificial Intelligence Research, 45(6), 250–261.
[6] Ramírez, A., & Costa, R. (2020). Addressing Cultural
Differences in Content Moderation Using AI and NLP.
International Journal of AI and Society, 15(3), 132-145.
VIT BHOPAL UNIVERSITY

Thank You

You might also like