0% found this document useful (0 votes)

24 views5 pages

Iihbjk

Sign2Speech is a real-time Indian Sign Language (ISL) recognition system that utilizes machine learning and computer vision to convert sign language gestures into spoken words. The system is designed to enhance communication for the deaf community and includes features like job listings and educational resources for learning ISL. Built using technologies such as MediaPipe, OpenCV, and Flask, it aims to improve accessibility and inclusivity for deaf individuals.

Uploaded by

microsoftaccthird

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views5 pages

Iihbjk

Uploaded by

microsoftaccthird

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Sign2Speech

Shivam Yadav Mrudula More Tanmay Pawar

Department of Information Technology Department of Information Technology Department of Information Technology
Vidyavardhini’s College of Engineering & Vidyavardhini’s College of Engineering & Vidyavardhini’s College of Engineering &
Technology, University of Mumbai Technology, University of Mumbai Technology, University of Mumbai
[email protected] [email protected] [email protected]

Abstract—Sign 2 Speech is an innovative system utilizes advanced machine learning techniques in conjunction
designed to empower the deaf and hard-of-hearing with computer vision tools to detect and interpret hand
community by enabling seamless communication with gestures from video input. By leveraging MediaPipe's hand
hearing individuals. The core functionality of the landmark detection, OpenCV for image and video processing,
system is its ability to detect Indian Sign Language and scikit-learn for model training and evaluation, the system
(ISL) gestures using computer vision techniques and is capable of recognizing complex sign language gestures
convert them into spoken words through natural
with high accuracy.
language processing (NLP). The system is built with a
robust backend using Flask and Ngrok, which allows In addition to sign language translation, the system also
for real-time sign language detection via webcam provides valuable resources for the deaf community, such as
input, and is accessible from any device with internet job listings and courses for learning sign language. These
connectivity. resources are aimed at improving the quality of life for deaf
The sign language recognition model is trained individuals by promoting employment opportunities and
using MediaPipe's hand landmark detection, a enhancing access to educational tools for sign language
cutting-edge technology that identifies key hand learning.
gestures (21 points) in video frames, combined with
This system not only enables real-time sign language
OpenCV for image/video input/output and webcam
capture. The processed data is fed into a machine detection but also strives to improve inclusivity and
learning model developed using scikit-learn, ensuring accessibility for the deaf community by addressing gaps in
accurate gesture recognition. communication, education, and employment.

Keywords –Indian Sign Language, Sign Language

Recognition, Natural Language Processing, Real-
time Detection, Flutter, Flask, Ngrok, MediaPipe,
II. LITERATURE REVIEW
OpenCV, Scikit-learn, Deaf Accessibility, Job [1] Traditional Sign Language Recognition Approaches::
Opportunities, Sign Language Courses, Machine
Early research in sign language recognition involved the use
Learning, Audio Conversion, Hand Landmark of data gloves and sensor-based systems, which captured
Detection, Text-to-Speech, Inclusivity, hand movements and joint angles to identify gestures.
Communication Accessibility. While these systems provided high accuracy, they required
users to wear specialized hardware, which limited their
I. INTRODUCTION scalability and accessibility. Projects such as the American
Sign Language (ASL) recognition systems using data
Sign language is an essential means of communication gloves demonstrated feasibility but lacked user-friendliness
for the deaf and hard-of-hearing community worldwide. for widespread adoption.
However, there remains a significant communication
[2] Computer Vision-Based Approaches:
barrier between the deaf and hearing populations,
With the evolution of computer vision technologies, camera-
primarily due to the lack of widespread understanding based sign language recognition systems gained popularity.
of sign language among non-deaf individuals. While Techniques involving skin color segmentation, background
there are various attempts to bridge this gap, existing subtraction, and contour detection were used to identify hand
solutions are often limited by factors such as gestures. However, these methods were sensitive to lighting
accessibility, accuracy, and real-time processing. conditions, backgrounds, and hand occlusions, leading to
inconsistent results in real-world applications.
The Sign 2 Speech project aims to address these
[3] Deep Learning and CNN-Based Models:
challenges by providing a real-time Indian Sign Recent advancements in deep learning, particularly
Language (ISL) recognition system that converts sign Convolutional Neural Networks (CNNs), have significantly
language gestures into spoken words. The system improved the performance of gesture recognition systems.
CNNs have been applied to classify sign language tracking—is used in this project for accurate and efficient
images and videos with impressive accuracy. Several extraction of hand landmarks. It provides 21 hand
studies have employed large datasets of hand gestures keypoints per frame, capturing detailed information about
to train models that recognize a wide range of signs. finger positions and palm orientation. These landmarks are
Although deep learning models offer high performance, essential for recognizing Indian Sign Language gestures, as
they often require extensive computational resources they represent the spatial configuration of the hand.
and large datasets, which may not be practical for real-
time or mobile applications. [5] OpenCV: OpenCV (Open Source Computer Vision
Library) is a widely-used library for computer vision and
[4] Use of MediaPipe and Hand Landmarks: image processing tasks. In this project, OpenCV is utilized
to handle video capture from the webcam, preprocess
A notable improvement in hand gesture recognition came
image data, and assist in rendering the detected hand
with the introduction of Google’s MediaPipe framework, landmarks for visualization purposes. It acts as a bridge
which offers real-time hand landmark detection using between raw input from the camera and the machine
lightweight models. MediaPipe provides 21 key hand learning pipeline.
landmarks per frame, allowing for more precise and
consistent tracking of hand movements. This technique [6] Scikit-learn: Scikit-learn is a robust machine learning
has been adopted in various recent studies for real-time library in Python that supports a wide range of supervised
sign language recognition due to its speed, accuracy, and and unsupervised learning algorithms. For the Sign 2
Speech project, it is used to train a Random Forest
minimal hardware requirements.
Classifier on a dataset derived from MediaPipe’s 21 hand
landmarks (42 features: x and y coordinates). The library’s
simple API allows for efficient model training,
III. OVERVIEW OF PROJECT hyperparameter tuning, evaluation, and testing.
The Sign 2 Speech project is a real-time Indian Sign
Language (ISL) recognition system designed to convert IV. FEATURES OF THE PROJECT
hand gestures into spoken language while providing
additional support resources for the deaf and hard-of- [1] Real-Time Sign-to-Speech and Text Conversion:
hearing community. The system is built as a cross- Sign2Speech instantly converts Indian Sign Language
platform solution using Flutter for the frontend and Flask (ISL) gestures into both spoken words and on-screen text.
(Python) for the backend, integrated through Ngrok for This allows individuals with hearing and speech
secure remote access. This section outlines the impairments to communicate with others in real time,
architecture, core components, technologies used, and improving day-to-day interactions in personal and
key features of the system. professional settings.[4]

[1] Flutter: Flutter is an open-source UI software [2] Intuitive and Accessible User Interface: The application is
development toolkit created by Google. It allows designed with a user-friendly Flutter-based interface that
developers to build natively compiled applications for ensures ease of navigation for deaf and mute users. Clear
mobile, web, and desktop using a single codebase. In layouts and accessible controls provide a seamless
the Sign 2 Speech project, Flutter is used to build a experience, even for users new to digital platforms.[1]
user-friendly and interactive frontend that works
seamlessly across Android devices, iOS devices, and [3] Interactive Learning Modules: Users can access
web browsers. structured, video-based tutorials that teach ISL from
beginner to advanced levels. These modules use engaging
[2] Flask: Flask is a lightweight and versatile web animations and real-life gesture examples to make
framework written in Python, widely used for learning effective and enjoyable.[2]
developing RESTful APIs and backend applications.
In this project, Flask acts as the core backend server [4] Courses to Learn Indian Sign Language:To promote
responsible for handling incoming HTTP requests awareness and learning, the app provides curated links and
from the Flutter frontend. When a user uploads an resources for online and offline courses on Indian Sign
image or captures a live frame from the camera, the Language. This helps individuals who want to learn ISL for
Flask backend receives this input and processes it communication, professional development, or educational
through the gesture recognition pipeline. purposes. Encouraging ISL literacy among non-signers is a
step towards a more inclusive society.[1]
[3] Ngrok: Ngrok is a powerful tunneling service that
allows developers to expose their local servers to the Cross-Platform Compatibility: The entire system is
internet via secure public URLs. It is used in the Sign developed using Flutter, ensuring that it works seamlessly
2 Speech system to make the locally hosted Flask on Android, iOS, and web platforms. Users can access the
backend accessible to the Flutter frontend running on app from a smartphone, tablet, or browser, maintaining
mobile or web platforms, regardless of network consistent performance and user experience across
constraints. Ngrok creates an encrypted tunnel from a devices.[5]
publicly accessible domain to the localhost.
V. METHODOLOGY
[4] MediaPipe: MediaPipe is a cross-platform, open-
source framework developed by Google for building [1] Data Collection:
multimodal (e.g., video, audio) machine learning We created a dataset consisting of images and video clips
pipelines. One of its most significant modules—hand of Indian Sign Language (ISL) gestures. Each gesture
represents a commonly used word or phrase. Data
was captured under controlled lighting conditions
from different users to increase the model's accuracy
and adaptability across variations.

[2] Hand Landmark Detection (Feature Extraction):

For feature extraction, we used MediaPipe's hand
tracking module, which detects 21 landmarks per
hand. Each landmark provides (x, y) coordinates
normalized to the frame. These points capture the
hand's posture and position, forming the basis for
recognizing specific gestures.

[3] Data Preprocessing: Fig.I: Sign Detection

The extracted landmarks are normalized to a
consistent scale to reduce variation due to hand size
or camera distance. Frames are uniformly sampled
from videos, and each gesture is assigned a label.
Data augmentation such as flipping is applied to
increase robustness. Invalid or noisy samples are
removed to maintain quality.

[4] Model Training:

We used a Random Forest Classifier from Scikit-
learn for training. Each sample contains 42 features—
(x, y) pairs from 21 landmarks. The dataset is split
into training and testing sets (80:20). Grid search and
cross-validation are used to fine-tune
hyperparameters. The final model is saved for
deployment. Fig.II: Job Opportunities Page

[5] Frontend Development with Flutter:

The frontend is built using Flutter for cross-platform
compatibility. It allows users to capture live gestures
or upload images. Detected gestures are shown as text
and spoken aloud. The UI is designed to be simple
and accessible, with additional features like job
listings and ISL learning modules.

[6] Backend API Development:

The Flask backend receives gesture input from the
frontend, processes the frame using OpenCV and
MediaPipe, and performs inference using the trained
model. The response is then sent back to the frontend
as a text label, which is further converted into speech Fig.III: Online Courses Page
using a text-to-speech engine

[7] Backend Integration with Flutter: VII. CONCLUSION

The backend is exposed to the internet using Ngrok,
enabling seamless integration with the Flutter The Sign2Speech project presents an innovative and practical
frontend across different platforms without deploying approach to bridging the communication gap between hearing
to a public server. HTTP requests from the app are and hearing-impaired individuals through the use of real-time
routed to Flask via a secure tunnel, making the system Indian Sign Language (ISL) recognition and natural language
accessible in real-time from any connected device. processing. By integrating advanced technologies such as
MediaPipe for hand landmark detection, OpenCV for image
processing, and scikit-learn for gesture classification, the
system demonstrates high accuracy and responsiveness in
VI. RESULT
detecting hand signs. The use of Flutter for the frontend
As a result, the Android application is created. The ensures a smooth, cross-platform user experience across
glimpse of the generated result are as follows: mobile and web applications, while Flask and Ngrok enable
flexible and accessible backend operations.
Moreover, the inclusion of features such as text-to-speech
conversion, job opportunities, and learning resources makes
the application more than just a recognition tool—it becomes
a comprehensive platform for empowerment, education, and
inclusion. The system supports both live camera detection and
image uploads, catering to diverse user scenarios and
hardware capabilities.In conclusion, Sign 2 Speech is a
significant step toward inclusive technology. It not only
showcases the potential of machine learning in real-
world applications but also contributes positively to
social good by enhancing accessibility and creating
opportunities for the deaf community. With future
improvements such as multi-hand detection, sentence
formation, and integration with regional languages.

VIII. REFERRENCE
[1] “M. A. Hearst, "Design recommendations for
accessible interactive systems," IEEE Transactions on
Human-Machine Systems, vol. 50, no. 2, pp. 145–157,
Apr. 2020. doi: 10.1109/THMS.2020.2964214.”

[2] “P. Molchanov, S. Gupta, K. Kim, and K. Pulli, "Hand

gesture recognition with 3D convolutional neural
networks," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops
(CVPRW), 2015, pp. 1–7.”

[3] “F. Zhang, C. Wang, J. Zhu, and H. Lu, "Real-time

hand gesture recognition using finger segmentation," in
2018 24th International Conference on Pattern
Recognition (ICPR), Beijing, China, 2018, pp. 3105–
3110. doi: 10.1109/ICPR.2018.8545524.”

[4] “S. Biswas, R. D. Das, and S. Basu, "A real-time hand

gesture recognition system for human-computer
interaction," in 2019 IEEE Calcutta Conference
(CALCON), Kolkata, India, 2019, pp. 298–303. doi:
10.1109/CALCON.2019.8776683.”

[5] “Google, "MediaPipe: Cross-platform, customizable

ML solutions for live and streaming media," [Online].
Available: https://mediapipe.dev.”

[6] “P. Viola and M. Jones, "Rapid object detection using

a boosted cascade of simple features," in Proceedings of
the 2001 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, Kauai, HI,
USA, 2001, pp. I-511–I-518..”

AWS Cloud Practitioner Exam Cram
100% (5)
AWS Cloud Practitioner Exam Cram
142 pages
Cloud Computing, Seminar Report
78% (9)
Cloud Computing, Seminar Report
17 pages
Gesture Recognition and Natural Language Processing For Real
No ratings yet
Gesture Recognition and Natural Language Processing For Real
11 pages
IJNRD2504467
No ratings yet
IJNRD2504467
6 pages
Sign Recognition Research Paper
No ratings yet
Sign Recognition Research Paper
16 pages
Mudratalk: Indian Sign Language Translator: Bharati Vidyapeeth Deemed To Be University
No ratings yet
Mudratalk: Indian Sign Language Translator: Bharati Vidyapeeth Deemed To Be University
18 pages
7th Sem Report Sign Language Recognition
No ratings yet
7th Sem Report Sign Language Recognition
15 pages
Hand Signs To Audio Converte1
No ratings yet
Hand Signs To Audio Converte1
11 pages
Research Paper5
No ratings yet
Research Paper5
6 pages
Review Paper Team35
No ratings yet
Review Paper Team35
5 pages
Synopsis
No ratings yet
Synopsis
4 pages
Ijeit1412202004 05
No ratings yet
Ijeit1412202004 05
5 pages
IJCRT2402561
No ratings yet
IJCRT2402561
9 pages
Assignment: Shubam Thakyal (2021A1R032)
No ratings yet
Assignment: Shubam Thakyal (2021A1R032)
51 pages
Dec 2024 New Paper
No ratings yet
Dec 2024 New Paper
7 pages
Sign Language Recognition Using Hand Gestures
No ratings yet
Sign Language Recognition Using Hand Gestures
5 pages
Project Review 1
No ratings yet
Project Review 1
24 pages
Sign Language
No ratings yet
Sign Language
5 pages
Text To Sign Langauge
No ratings yet
Text To Sign Langauge
15 pages
Conference Paper
No ratings yet
Conference Paper
5 pages
Silent Signals AI Power Sign Language Recognization
No ratings yet
Silent Signals AI Power Sign Language Recognization
8 pages
From - Table - of - Content - Report - s2t (1) (1) 2
No ratings yet
From - Table - of - Content - Report - s2t (1) (1) 2
33 pages
Sign Language Recogntion Report
No ratings yet
Sign Language Recogntion Report
29 pages
Final Conf PPT
No ratings yet
Final Conf PPT
11 pages
ICIOT Template Paper ID 650
No ratings yet
ICIOT Template Paper ID 650
6 pages
Report SLD
No ratings yet
Report SLD
21 pages
American Sign Language Real Time Detection Using TensorFlow and Keras in Python
No ratings yet
American Sign Language Real Time Detection Using TensorFlow and Keras in Python
6 pages
Document 8 - Donee
No ratings yet
Document 8 - Donee
113 pages
Journal Paper - Sign Language
No ratings yet
Journal Paper - Sign Language
10 pages
PBL Final
No ratings yet
PBL Final
14 pages
Final Minor Report
No ratings yet
Final Minor Report
24 pages
Batch 15
No ratings yet
Batch 15
13 pages
Divyesh 1
No ratings yet
Divyesh 1
4 pages
IEEE Paper Batch02
No ratings yet
IEEE Paper Batch02
4 pages
AUDIO TO SIGN LANGUAGE Final Fishries++ (2621+to+2630)
No ratings yet
AUDIO TO SIGN LANGUAGE Final Fishries++ (2621+to+2630)
10 pages
Dti Report
No ratings yet
Dti Report
20 pages
1001 Submission
No ratings yet
1001 Submission
7 pages
AI Report
No ratings yet
AI Report
23 pages
Signbridge-Audio To Sign Language Translator
No ratings yet
Signbridge-Audio To Sign Language Translator
5 pages
Real Time Indian Sign Language Recognition and Speech Generation Using Convolutional Neural Network
No ratings yet
Real Time Indian Sign Language Recognition and Speech Generation Using Convolutional Neural Network
4 pages
5476 12069 1 Ed
No ratings yet
5476 12069 1 Ed
14 pages
Sign Language Detection
No ratings yet
Sign Language Detection
6 pages
Fin Irjmets1682255678
No ratings yet
Fin Irjmets1682255678
5 pages
Research Paper - Sign Language Dect
No ratings yet
Research Paper - Sign Language Dect
9 pages
Batch 2 - It A
No ratings yet
Batch 2 - It A
23 pages
Sign Language Detection Using Machine Learning
No ratings yet
Sign Language Detection Using Machine Learning
6 pages
Reviw Paper
No ratings yet
Reviw Paper
4 pages
Rechaerch
No ratings yet
Rechaerch
14 pages
Visual Language Interpreter
No ratings yet
Visual Language Interpreter
7 pages
REPORT - FINAL - Praga
No ratings yet
REPORT - FINAL - Praga
29 pages
Conference Paper Signmeet
No ratings yet
Conference Paper Signmeet
6 pages
Sign Language Recognition System Using Flex Sensor Network
No ratings yet
Sign Language Recognition System Using Flex Sensor Network
6 pages
Visvesvaraya Technological University
No ratings yet
Visvesvaraya Technological University
8 pages
Sign Language Detection Using Mediapipe and Deep Learning
No ratings yet
Sign Language Detection Using Mediapipe and Deep Learning
6 pages
Ends Emp PT Sign Language
No ratings yet
Ends Emp PT Sign Language
16 pages
Hand Sign Language Translator For Speech Impaired
No ratings yet
Hand Sign Language Translator For Speech Impaired
4 pages
Indian Sign Language Interpretation and Sentence Formation: Disha Gangadia Varsha Chamaria Vidhi Doshi
No ratings yet
Indian Sign Language Interpretation and Sentence Formation: Disha Gangadia Varsha Chamaria Vidhi Doshi
6 pages
REPORT FROM INTRODUCTION (May 4)
No ratings yet
REPORT FROM INTRODUCTION (May 4)
24 pages
Smart Translation
No ratings yet
Smart Translation
24 pages
Sign Language Recognition System With Speech Output
No ratings yet
Sign Language Recognition System With Speech Output
5 pages
Mohammed Maqdoom Jahagirdarp2Yo
No ratings yet
Mohammed Maqdoom Jahagirdarp2Yo
9 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
HuongDan ASM1
No ratings yet
HuongDan ASM1
6 pages
What I Need To Know: After Going Through This Module, You Will Be Able To
No ratings yet
What I Need To Know: After Going Through This Module, You Will Be Able To
12 pages
Windows Server Get Started
No ratings yet
Windows Server Get Started
245 pages
CDLU BCA Syllabi New1324
No ratings yet
CDLU BCA Syllabi New1324
11 pages
Busqueda GOOGLE
No ratings yet
Busqueda GOOGLE
3 pages
ExoneratingMorocco DisprovingTheSpyware
No ratings yet
ExoneratingMorocco DisprovingTheSpyware
27 pages
Samsung Galaxy Homework Ad
100% (1)
Samsung Galaxy Homework Ad
7 pages
Ip Office Power User - lb4323
No ratings yet
Ip Office Power User - lb4323
2 pages
Getting Started With SAP HANA 2.0, Express Edition (Virtual Machine Method)
No ratings yet
Getting Started With SAP HANA 2.0, Express Edition (Virtual Machine Method)
80 pages
Computer Science - Data Structures
No ratings yet
Computer Science - Data Structures
3 pages
VMware Interview Questions
100% (1)
VMware Interview Questions
9 pages
New PT Study Finds That Dell EMC PowerStore 7000 Series Arrays Outperformed The HPE Primera A670 in Data Reduction, Performance, Out Of-The-Box VM Deployment and More
No ratings yet
New PT Study Finds That Dell EMC PowerStore 7000 Series Arrays Outperformed The HPE Primera A670 in Data Reduction, Performance, Out Of-The-Box VM Deployment and More
2 pages
CSS Background
No ratings yet
CSS Background
11 pages
CO Theory Project Report
No ratings yet
CO Theory Project Report
27 pages
Digital Logic Design Homework 2 Solutions
100% (1)
Digital Logic Design Homework 2 Solutions
4 pages
Case Study MIS of Deloitte
No ratings yet
Case Study MIS of Deloitte
7 pages
Awesome Advanced Windows Exploitation References
No ratings yet
Awesome Advanced Windows Exploitation References
5 pages
Muhammad Khaleel Afzal Full Stack Software Engineer Solution Architect
No ratings yet
Muhammad Khaleel Afzal Full Stack Software Engineer Solution Architect
4 pages
G Vivekananda Reddy-1
No ratings yet
G Vivekananda Reddy-1
2 pages
3.NCE V100R019C00 Basic Software and Hardware Operations (TaiShan) - 201910
No ratings yet
3.NCE V100R019C00 Basic Software and Hardware Operations (TaiShan) - 201910
25 pages
9305 - Datasheet UPS
No ratings yet
9305 - Datasheet UPS
2 pages
MCQ S
No ratings yet
MCQ S
6 pages
7951 Teka ST Makati - Google Search
No ratings yet
7951 Teka ST Makati - Google Search
1 page
Ict 6 Tos
No ratings yet
Ict 6 Tos
1 page
Tfden-001 Setup
No ratings yet
Tfden-001 Setup
292 pages
Technical Data Sheet: NIR-Online
No ratings yet
Technical Data Sheet: NIR-Online
10 pages
Sanofi Coursera Pathways
No ratings yet
Sanofi Coursera Pathways
6 pages
Verilog Interview Questions
No ratings yet
Verilog Interview Questions
67 pages

Iihbjk

Uploaded by

Iihbjk

Uploaded by

Sign2Speech

Shivam Yadav Mrudula More Tanmay Pawar

Keywords –Indian Sign Language, Sign Language

[2] Hand Landmark Detection (Feature Extraction):

[3] Data Preprocessing: Fig.I: Sign Detection

[4] Model Training:

[5] Frontend Development with Flutter:

[6] Backend API Development:

[7] Backend Integration with Flutter: VII. CONCLUSION

[2] “P. Molchanov, S. Gupta, K. Kim, and K. Pulli, "Hand

[3] “F. Zhang, C. Wang, J. Zhu, and H. Lu, "Real-time

[4] “S. Biswas, R. D. Das, and S. Basu, "A real-time hand

[5] “Google, "MediaPipe: Cross-platform, customizable

[6] “P. Viola and M. Jones, "Rapid object detection using

You might also like