0% found this document useful (0 votes)

11 views21 pages

Icsdp Review

The document outlines a project presented at the 4th International Conference on Signal and Data Processing, focusing on developing a real-time multilingual offensive language detection system to enhance safety in digital spaces. The system utilizes advanced machine learning and natural language processing to identify and filter objectionable content across various languages, addressing limitations in existing systems. Future research aims to improve detection accuracy in underrepresented languages and minimize bias in the model's performance.

Uploaded by

9921004111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views21 pages

Icsdp Review

Uploaded by

9921004111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Lecture Notes in Electrical Engineering

4th International Conference on

Signal and Data Processing [ICSDP]
VIT BHOPAL UNIVERSITY

21-22 November 2024

Organized by School of Electrical and Electronics Engineering

Paper ID:67
Track No:08
Title: Real-Time Multilingual Offensive Words
Detection: Enhancing Safety in Global Digital
Spaces
S.No Name(s) of the Affiliation
Authors
1. Annagiri Abhiram Kalasalingam Academy Of Research And Education
VIT BHOPAL UNIVERSITY

Anand Nagar, Krishnan Koil, Srivilliputtur(Via),

Virudhunagar(Dt), Tamil Nadu.

2. Bonela Rakesh Kalasalingam Academy Of Research And Education

Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu.

3. B Surya Nagendra Kalasalingam Academy Of Research And Education

Prathap Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu

4. Chitturi Jaswanth Kalasalingam Academy Of Research And Education

Kumar Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu.
S.No Name(s) of the Affiliation
VIT BHOPAL UNIVERSITY

Authors
5. G. Nagarajan Kalasalingam Academy Of Research And Education
Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu.

6. R. Durga Meena Kalasalingam Academy Of Research And Education

Anand Nagar, Krishnan Koil, Srivilliputtur(Via),
Virudhunagar(Dt), Tamil Nadu.

12/11/2014
Outline
VIT BHOPAL UNIVERSITY

The project aims to create a real-time system using advanced machine

learning and natural language processing to identify and filter
objectionable content across multiple languages. It focuses on
enhancing user experience by offering a context-sensitive,
multilingual solution with superior speed, accuracy, and coverage.
Trained on diverse datasets, the system demonstrates excellent
scalability and effectiveness through rigorous testing. This innovative
approach promotes a safer and more inclusive internet for global
users.
Problem Statement
VIT BHOPAL UNIVERSITY

The surge in offensive language on social

media platforms has become a pressing
concern, adversely impacting user
experience, fostering toxicity, and posing
potential threats to individuals and
communities. Developing an effective and
efficient system to detect and mitigate
offensive language in real-time automatically
is essential to create a safer and more
positive online environment. This project
aims to address this challenge by designing a
robust solution to identify and manage
offensive content on social media platforms,
thus promoting a more inclusive and
respectful online discourse.
Objective
VIT BHOPAL UNIVERSITY

The goal is to create a tool to efficiently and accurately identify and

eliminate offensive language, enhancing online user experience by
promoting a more respectful and inclusive digital environment
Limitations in Existing Systems
➢ Limited Language Coverage: Most hate speech detection
VIT BHOPAL UNIVERSITY

systems are built for English and a few main languages,

excluding numerous regional and minority languages. Because
of this, there are large blind areas where dangerous information
can proliferate unchecked.
➢ Inconsistent Accuracy Across Languages: It can be challenging
to maintain consistent accuracy across different languages since
hate speech detection systems frequently perform well in one
language but badly in others due to variations in grammar,
slang, and syntax.
Limitations in Existing Systems
VIT BHOPAL UNIVERSITY

➢ Cultural and Contextual Variations: Depending on the cultural

setting, hate speech can take many different forms. Due to their
inability to comprehend local context or cultural quirks.
➢ Bias in Training Data: Biased datasets that overrepresent
particular languages or cultures are commonly used to train hate
speech detection programs. As a result, the algorithm may
overlook hate speech in one language while responding to
harmless information in another, producing biased results.
VIT BHOPAL UNIVERSITY

Proposed Methodology
Proposed Methodology
VIT BHOPAL UNIVERSITY

➢ Data Preparation: Load bad words list and Twitter dataset.

➢ Text Preprocessing: Clean and vectorize the text.
➢ Model Training: Train and save the Decision Tree classifier.
➢ User Input: The user enters text and selects language.
➢ Translation: Translate text to English (if needed).
➢ Prediction: Classify text, and replace offensive words with asterisks.
➢ Re-translation: Translate filtered text back to the original language.
➢ Output: Display the results to the user via the web interface.
Algorithms Used
VIT BHOPAL UNIVERSITY

➢ Regular Expressions for text cleaning and Snowball Stemmer

for word stemming.
➢ Count Vectorizer to convert text into numerical features.
➢ Decision Tree Classifier for classifying text as hate speech,
offensive, or non-offensive.
➢ Google Translate API for handling multilingual translation of
input text.
Implementation
VIT BHOPAL UNIVERSITY

The implementation begins with data preparation, compiling a

multilingual dataset and preprocessing text to remove unwanted
elements and convert it into numerical form. A Decision Tree
Classifier is trained to detect offensive content. User inputs are
translated to English for analysis, with offensive words masked and
retranslations ensuring context preservation. The processed text is

displayed in real-time, promoting safe communication.

VIT BHOPAL UNIVERSITY

12/11/2014
VIT BHOPAL UNIVERSITY

12/11/2014
Conclusion
VIT BHOPAL UNIVERSITY

Developing a real-time multilingual offensive language detection

system addresses toxic content on social media, enhancing user
safety and inclusivity. Leveraging advanced machine learning, it
efficiently identifies and filters harmful language across various
languages, promoting respectful online interactions. The system
ensures real-time moderation without disrupting communication,
making it scalable and user-friendly for global platforms. By
reducing cyberbullying and hate speech, it fosters a positive digital
environment and empowers users for constructive engagement.
This initiative sets a foundation for AI-driven content management,
creating safer and more inclusive online communities.
Future Scope
VIT BHOPAL UNIVERSITY

➢ While the model does well in English and Spanish, it has to be

improved in Bengali, Arabic, and French, where it performs
less accurately.
➢ To improve detection accuracy, future research should take into
account cultural variations in offensive language.
➢ Google Translate helps support several languages, however, for
better outcomes in low-performing languages, translation
mistakes should be kept to a minimum.
Future Scope
VIT BHOPAL UNIVERSITY

➢ Future studies should concentrate on minimizing bias and

guaranteeing consistent outcomes because the model's
performance varies depending on the language.
➢ Further data collection for underrepresented languages, such as
Arabic and Bengali, will increase the accuracy of detection in
subsequent cycles.
References
VIT BHOPAL UNIVERSITY

[1] Patel, R., & Singh, P. (2020). The role of digital platforms in
shaping global communication. International Journal of Media Studies,
12(2), 67–85.
[2] Gómez, C., & Hernández, L. (2019). Multilingualism in online
communities: Challenges and opportunities for content moderation.
Journal of Multicultural Digital Discourse, 10(3), 156–172.
[3] Nguyen, M., & Lee, J. (2020). The limitations of traditional content
moderation: A linguistic perspective. Journal of Digital Ethics, 9(4),
104– 119.
References
VIT BHOPAL UNIVERSITY

[4] Brown, T., & Wilson, K. (2021). Evaluating the efficacy of

keyword-based content filtering systems. Journal of Computational
Moderation, 5(1), 25–36.
[5] Park, Y., & Kim, D. (2022). Machine learning for real- time
offensive language detection: Challenges in multilingual settings.
Journal of Artificial Intelligence Research, 45(6), 250–261.
[6] Ramírez, A., & Costa, R. (2020). Addressing Cultural
Differences in Content Moderation Using AI and NLP.
International Journal of AI and Society, 15(3), 132-145.
VIT BHOPAL UNIVERSITY

Thank You

Coroner's Findings - Inquest Into The Mangatepopo Gorge Disaster - Coroner CJ Davenport - 30th March 2010
No ratings yet
Coroner's Findings - Inquest Into The Mangatepopo Gorge Disaster - Coroner CJ Davenport - 30th March 2010
39 pages
The Kinetics of Enzyme - Catalyzed Reactions
100% (1)
The Kinetics of Enzyme - Catalyzed Reactions
38 pages
PM 02 03 Management Review
No ratings yet
PM 02 03 Management Review
4 pages
Abel Math Harvard - Edu PDF
No ratings yet
Abel Math Harvard - Edu PDF
65 pages
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
No ratings yet
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
22 pages
Detection of Hate Speech and Offensive Language CodeMix Text in Dravidian Languages Using Cost-Sensitive Learning Approach
No ratings yet
Detection of Hate Speech and Offensive Language CodeMix Text in Dravidian Languages Using Cost-Sensitive Learning Approach
27 pages
Anja Golob 5 Poems (Tadeja Spruk)
No ratings yet
Anja Golob 5 Poems (Tadeja Spruk)
9 pages
Iconlibrary Production Oct2016
No ratings yet
Iconlibrary Production Oct2016
137 pages
Artifact #9: Integrated Math and Science Lesson Plan
No ratings yet
Artifact #9: Integrated Math and Science Lesson Plan
12 pages
Final Project Report
No ratings yet
Final Project Report
76 pages
Final Project Report
No ratings yet
Final Project Report
71 pages
BHEL Application
No ratings yet
BHEL Application
6 pages
Uji Normalitas Data SPSS - Puspita Utari D042202010
No ratings yet
Uji Normalitas Data SPSS - Puspita Utari D042202010
7 pages
Basic and Advanced Laboratory Techniques in Histopathology and Cytology
100% (12)
Basic and Advanced Laboratory Techniques in Histopathology and Cytology
275 pages
Energy Efficient Pumping Technology Innovations and Recent Trends
No ratings yet
Energy Efficient Pumping Technology Innovations and Recent Trends
15 pages
Fox Pueblo Baseball A New Use For Old Witchcraft 1961
No ratings yet
Fox Pueblo Baseball A New Use For Old Witchcraft 1961
9 pages
Autocad 2008 Features and Benifits
No ratings yet
Autocad 2008 Features and Benifits
7 pages
Fitch 1963
No ratings yet
Fitch 1963
9 pages
PD Integration of Thought Feelings and Behavior
No ratings yet
PD Integration of Thought Feelings and Behavior
15 pages
Bulk LPG Layout Requirements-Comparison BTW San & Nfpa 58
No ratings yet
Bulk LPG Layout Requirements-Comparison BTW San & Nfpa 58
25 pages
FSM 4 Basic Baking Report
No ratings yet
FSM 4 Basic Baking Report
12 pages
Drilling Engineering 30 Days Program
No ratings yet
Drilling Engineering 30 Days Program
2 pages
NPTEL Courses - Final Course List (Jan - April 2022)
No ratings yet
NPTEL Courses - Final Course List (Jan - April 2022)
15 pages
04 0862 02 MS 6RP AFP tcm143-686151
68% (19)
04 0862 02 MS 6RP AFP tcm143-686151
12 pages
Lit Analysis (The Illiad)
No ratings yet
Lit Analysis (The Illiad)
4 pages
Boucherit Oussama F1
No ratings yet
Boucherit Oussama F1
55 pages
20210221061033pmSSCI 538
No ratings yet
20210221061033pmSSCI 538
11 pages
2023 Dravidianlangtech-1
No ratings yet
2023 Dravidianlangtech-1
330 pages
Overview of The Track On Hasoc-Offensive Language Identification-Dravidiancodemix
No ratings yet
Overview of The Track On Hasoc-Offensive Language Identification-Dravidiancodemix
9 pages
Introduction To Environmental Engineering
No ratings yet
Introduction To Environmental Engineering
17 pages
Statistical Inference (BW-SP20)
No ratings yet
Statistical Inference (BW-SP20)
2 pages
Energy Fiji Limited: Application Form To Sit Wireman'S Licence Examination
100% (1)
Energy Fiji Limited: Application Form To Sit Wireman'S Licence Examination
2 pages
1 s2.0 S2949719123000389 Main
No ratings yet
1 s2.0 S2949719123000389 Main
16 pages
Template
No ratings yet
Template
16 pages
Ousidhoun Multilingual and Multi Aspect HAte Speech Analysis
No ratings yet
Ousidhoun Multilingual and Multi Aspect HAte Speech Analysis
10 pages
Enhancing Multilingual Hate Speech Detection From Language-Specific Insights To Cross-Linguistic Integration
No ratings yet
Enhancing Multilingual Hate Speech Detection From Language-Specific Insights To Cross-Linguistic Integration
31 pages
Real-Time Multilingual Offensive Words Detection: Enhancing Safety in Global Digital Spaces
No ratings yet
Real-Time Multilingual Offensive Words Detection: Enhancing Safety in Global Digital Spaces
7 pages
CSP Report FINAL
No ratings yet
CSP Report FINAL
46 pages
Caps Final
No ratings yet
Caps Final
20 pages
5 Hate - Speech - Detection - in - Low-Resourced - Indian - Lang
No ratings yet
5 Hate - Speech - Detection - in - Low-Resourced - Indian - Lang
22 pages
Lecture 8 BEC
No ratings yet
Lecture 8 BEC
14 pages
2024 Ltedi-1 20
No ratings yet
2024 Ltedi-1 20
6 pages
Final Review PPT 19bit0029
No ratings yet
Final Review PPT 19bit0029
65 pages
Detecting Offensive Language in Bengali, Bodo, and Assamese Using Word Unigrams, Char N-Grams, Classical Machine Learning, and Deep Learning Methods
No ratings yet
Detecting Offensive Language in Bengali, Bodo, and Assamese Using Word Unigrams, Char N-Grams, Classical Machine Learning, and Deep Learning Methods
9 pages
Hate Speech Detection of Arabic Social Media Using Machine Learning Techniques: A Comparative Study
No ratings yet
Hate Speech Detection of Arabic Social Media Using Machine Learning Techniques: A Comparative Study
24 pages
A Large Expert Annotated Corpus of Brazilian Instagram Comments For Offensive Language and Hate Speech Detection
No ratings yet
A Large Expert Annotated Corpus of Brazilian Instagram Comments For Offensive Language and Hate Speech Detection
10 pages
Detection of Hate Speech Using BERT and Hate Speech Word Embedding With Deep Model
No ratings yet
Detection of Hate Speech Using BERT and Hate Speech Word Embedding With Deep Model
23 pages
Applsci 11 08575
No ratings yet
Applsci 11 08575
21 pages
A Multilingual Evaluation For Online Hate Speech Detection
No ratings yet
A Multilingual Evaluation For Online Hate Speech Detection
22 pages
18CSE006 Thesis Report
No ratings yet
18CSE006 Thesis Report
23 pages
Contextual-Aware and Expert Data Resources For Bra
No ratings yet
Contextual-Aware and Expert Data Resources For Bra
22 pages
Multilingual Hate Speech Detection A Semi-Supervised Generative Adversarial Approach
No ratings yet
Multilingual Hate Speech Detection A Semi-Supervised Generative Adversarial Approach
19 pages
BRIM Syllabus
No ratings yet
BRIM Syllabus
4 pages
Frai 07 1345445
No ratings yet
Frai 07 1345445
19 pages
CSP Springer
No ratings yet
CSP Springer
8 pages
Overview of The HASOC Subtrack at FIRE 2022 Identification of Conversational Hate-Speech in Hindi-English Code-Mixed and German Language-T7-1
No ratings yet
Overview of The HASOC Subtrack at FIRE 2022 Identification of Conversational Hate-Speech in Hindi-English Code-Mixed and German Language-T7-1
14 pages
Hate Speech Detection and Racial Bias Mitigation in Social Media Based On BERT Model
No ratings yet
Hate Speech Detection and Racial Bias Mitigation in Social Media Based On BERT Model
26 pages
2022 Paclic-1 94
No ratings yet
2022 Paclic-1 94
13 pages
Hate Speech Detection Is Not As Easy As You May Think
No ratings yet
Hate Speech Detection Is Not As Easy As You May Think
17 pages
Capstone Review 02
No ratings yet
Capstone Review 02
54 pages
TOLD Tamil Offensive Language Detection in
No ratings yet
TOLD Tamil Offensive Language Detection in
13 pages
Offensive Comments in The Brazilian Web: A Dataset and Baseline Results
No ratings yet
Offensive Comments in The Brazilian Web: A Dataset and Baseline Results
10 pages
Digital Marketing Be Etc (Insem.) (2019 Pattern) (Semester Viii) (Elective Vi) March 24
No ratings yet
Digital Marketing Be Etc (Insem.) (2019 Pattern) (Semester Viii) (Elective Vi) March 24
1 page
2022 Lrec-1 777
No ratings yet
2022 Lrec-1 777
10 pages
BDA Minor Specialization Literature Review IEEE Format
No ratings yet
BDA Minor Specialization Literature Review IEEE Format
5 pages
Detecting Offensive Tweets in Hindi-English Code-Switched Language-W18-3504
No ratings yet
Detecting Offensive Tweets in Hindi-English Code-Switched Language-W18-3504
9 pages
Kec Ai Gryffindor Dravidianlangtech Naacl 2025
No ratings yet
Kec Ai Gryffindor Dravidianlangtech Naacl 2025
7 pages
Overview of The HASOC Subtrack at FIRE 2021 Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages-T1-1
No ratings yet
Overview of The HASOC Subtrack at FIRE 2021 Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages-T1-1
19 pages
Related Work
No ratings yet
Related Work
12 pages
NLP Paper
No ratings yet
NLP Paper
5 pages
Towards Understanding People From Multilingual Societies (Deepanshu Vijay, MS, 201302093)
No ratings yet
Towards Understanding People From Multilingual Societies (Deepanshu Vijay, MS, 201302093)
46 pages
Journal Pone 0305657
No ratings yet
Journal Pone 0305657
24 pages
Hate Speech Detection Using Machine Learning2
No ratings yet
Hate Speech Detection Using Machine Learning2
4 pages
Offensive Social Network Posts Classification Using Apache Spark Platform
No ratings yet
Offensive Social Network Posts Classification Using Apache Spark Platform
7 pages
Abusive Language Detection in Speech Dataset
No ratings yet
Abusive Language Detection in Speech Dataset
6 pages
Online Social Networks and Media: Safa Alsafari, Samira Sadaoui, Malek Mouhoub
No ratings yet
Online Social Networks and Media: Safa Alsafari, Samira Sadaoui, Malek Mouhoub
15 pages
1 s2.0 S0306457321002417 Main
No ratings yet
1 s2.0 S0306457321002417 Main
22 pages
Roman Urdu Multi-Class Offensive Text Detection - 2020
No ratings yet
Roman Urdu Multi-Class Offensive Text Detection - 2020
6 pages
Detecting Offensive Language in English, Hindi, and Marathi Using Classical Supervised Machine Learning Methods and Word/Char N-Grams
No ratings yet
Detecting Offensive Language in English, Hindi, and Marathi Using Classical Supervised Machine Learning Methods and Word/Char N-Grams
7 pages
Social Media Text Analytics of Malayalam - English Code Mixed Using Deep Learning
No ratings yet
Social Media Text Analytics of Malayalam - English Code Mixed Using Deep Learning
25 pages
Hatemonitors: Language Agnostic Abuse Detection in Social Media
No ratings yet
Hatemonitors: Language Agnostic Abuse Detection in Social Media
8 pages
12 V May 2024
No ratings yet
12 V May 2024
8 pages
Machine Learning Based Automatic Hate Speech Recognition System
No ratings yet
Machine Learning Based Automatic Hate Speech Recognition System
4 pages
RP 5
No ratings yet
RP 5
7 pages
A296 D Stamped
No ratings yet
A296 D Stamped
4 pages
Profile
No ratings yet
Profile
2 pages
Sarcasm Detection For Hindi English Code Mixed Twitter Data
No ratings yet
Sarcasm Detection For Hindi English Code Mixed Twitter Data
8 pages
Countering Hate Speech On Social Media
No ratings yet
Countering Hate Speech On Social Media
2 pages
Quick Hits for Teaching with Technology: Successful Strategies by Award-Winning Teachers
From Everand
Quick Hits for Teaching with Technology: Successful Strategies by Award-Winning Teachers
Robin K. Morgan
No ratings yet
Level 4machining
No ratings yet
Level 4machining
1 page