0% found this document useful (0 votes)
24 views5 pages

Iihbjk

Sign2Speech is a real-time Indian Sign Language (ISL) recognition system that utilizes machine learning and computer vision to convert sign language gestures into spoken words. The system is designed to enhance communication for the deaf community and includes features like job listings and educational resources for learning ISL. Built using technologies such as MediaPipe, OpenCV, and Flask, it aims to improve accessibility and inclusivity for deaf individuals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

Iihbjk

Sign2Speech is a real-time Indian Sign Language (ISL) recognition system that utilizes machine learning and computer vision to convert sign language gestures into spoken words. The system is designed to enhance communication for the deaf community and includes features like job listings and educational resources for learning ISL. Built using technologies such as MediaPipe, OpenCV, and Flask, it aims to improve accessibility and inclusivity for deaf individuals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Sign2Speech

Shivam Yadav Mrudula More Tanmay Pawar


Department of Information Technology Department of Information Technology Department of Information Technology
Vidyavardhini’s College of Engineering & Vidyavardhini’s College of Engineering & Vidyavardhini’s College of Engineering &
Technology, University of Mumbai Technology, University of Mumbai Technology, University of Mumbai
[email protected] [email protected] [email protected]

Abstract—Sign 2 Speech is an innovative system utilizes advanced machine learning techniques in conjunction
designed to empower the deaf and hard-of-hearing with computer vision tools to detect and interpret hand
community by enabling seamless communication with gestures from video input. By leveraging MediaPipe's hand
hearing individuals. The core functionality of the landmark detection, OpenCV for image and video processing,
system is its ability to detect Indian Sign Language and scikit-learn for model training and evaluation, the system
(ISL) gestures using computer vision techniques and is capable of recognizing complex sign language gestures
convert them into spoken words through natural
with high accuracy.
language processing (NLP). The system is built with a
robust backend using Flask and Ngrok, which allows In addition to sign language translation, the system also
for real-time sign language detection via webcam provides valuable resources for the deaf community, such as
input, and is accessible from any device with internet job listings and courses for learning sign language. These
connectivity. resources are aimed at improving the quality of life for deaf
The sign language recognition model is trained individuals by promoting employment opportunities and
using MediaPipe's hand landmark detection, a enhancing access to educational tools for sign language
cutting-edge technology that identifies key hand learning.
gestures (21 points) in video frames, combined with
This system not only enables real-time sign language
OpenCV for image/video input/output and webcam
capture. The processed data is fed into a machine detection but also strives to improve inclusivity and
learning model developed using scikit-learn, ensuring accessibility for the deaf community by addressing gaps in
accurate gesture recognition. communication, education, and employment.

Keywords –Indian Sign Language, Sign Language


Recognition, Natural Language Processing, Real-
time Detection, Flutter, Flask, Ngrok, MediaPipe,
II. LITERATURE REVIEW
OpenCV, Scikit-learn, Deaf Accessibility, Job [1] Traditional Sign Language Recognition Approaches::
Opportunities, Sign Language Courses, Machine
Early research in sign language recognition involved the use
Learning, Audio Conversion, Hand Landmark of data gloves and sensor-based systems, which captured
Detection, Text-to-Speech, Inclusivity, hand movements and joint angles to identify gestures.
Communication Accessibility. While these systems provided high accuracy, they required
users to wear specialized hardware, which limited their
I. INTRODUCTION scalability and accessibility. Projects such as the American
Sign Language (ASL) recognition systems using data
Sign language is an essential means of communication gloves demonstrated feasibility but lacked user-friendliness
for the deaf and hard-of-hearing community worldwide. for widespread adoption.
However, there remains a significant communication
[2] Computer Vision-Based Approaches:
barrier between the deaf and hearing populations,
With the evolution of computer vision technologies, camera-
primarily due to the lack of widespread understanding based sign language recognition systems gained popularity.
of sign language among non-deaf individuals. While Techniques involving skin color segmentation, background
there are various attempts to bridge this gap, existing subtraction, and contour detection were used to identify hand
solutions are often limited by factors such as gestures. However, these methods were sensitive to lighting
accessibility, accuracy, and real-time processing. conditions, backgrounds, and hand occlusions, leading to
inconsistent results in real-world applications.
The Sign 2 Speech project aims to address these
[3] Deep Learning and CNN-Based Models:
challenges by providing a real-time Indian Sign Recent advancements in deep learning, particularly
Language (ISL) recognition system that converts sign Convolutional Neural Networks (CNNs), have significantly
language gestures into spoken words. The system improved the performance of gesture recognition systems.
CNNs have been applied to classify sign language tracking—is used in this project for accurate and efficient
images and videos with impressive accuracy. Several extraction of hand landmarks. It provides 21 hand
studies have employed large datasets of hand gestures keypoints per frame, capturing detailed information about
to train models that recognize a wide range of signs. finger positions and palm orientation. These landmarks are
Although deep learning models offer high performance, essential for recognizing Indian Sign Language gestures, as
they often require extensive computational resources they represent the spatial configuration of the hand.
and large datasets, which may not be practical for real-
time or mobile applications. [5] OpenCV: OpenCV (Open Source Computer Vision
Library) is a widely-used library for computer vision and
[4] Use of MediaPipe and Hand Landmarks: image processing tasks. In this project, OpenCV is utilized
to handle video capture from the webcam, preprocess
A notable improvement in hand gesture recognition came
image data, and assist in rendering the detected hand
with the introduction of Google’s MediaPipe framework, landmarks for visualization purposes. It acts as a bridge
which offers real-time hand landmark detection using between raw input from the camera and the machine
lightweight models. MediaPipe provides 21 key hand learning pipeline.
landmarks per frame, allowing for more precise and
consistent tracking of hand movements. This technique [6] Scikit-learn: Scikit-learn is a robust machine learning
has been adopted in various recent studies for real-time library in Python that supports a wide range of supervised
sign language recognition due to its speed, accuracy, and and unsupervised learning algorithms. For the Sign 2
Speech project, it is used to train a Random Forest
minimal hardware requirements.
Classifier on a dataset derived from MediaPipe’s 21 hand
landmarks (42 features: x and y coordinates). The library’s
simple API allows for efficient model training,
III. OVERVIEW OF PROJECT hyperparameter tuning, evaluation, and testing.
The Sign 2 Speech project is a real-time Indian Sign
Language (ISL) recognition system designed to convert IV. FEATURES OF THE PROJECT
hand gestures into spoken language while providing
additional support resources for the deaf and hard-of- [1] Real-Time Sign-to-Speech and Text Conversion:
hearing community. The system is built as a cross- Sign2Speech instantly converts Indian Sign Language
platform solution using Flutter for the frontend and Flask (ISL) gestures into both spoken words and on-screen text.
(Python) for the backend, integrated through Ngrok for This allows individuals with hearing and speech
secure remote access. This section outlines the impairments to communicate with others in real time,
architecture, core components, technologies used, and improving day-to-day interactions in personal and
key features of the system. professional settings.[4]

[1] Flutter: Flutter is an open-source UI software [2] Intuitive and Accessible User Interface: The application is
development toolkit created by Google. It allows designed with a user-friendly Flutter-based interface that
developers to build natively compiled applications for ensures ease of navigation for deaf and mute users. Clear
mobile, web, and desktop using a single codebase. In layouts and accessible controls provide a seamless
the Sign 2 Speech project, Flutter is used to build a experience, even for users new to digital platforms.[1]
user-friendly and interactive frontend that works
seamlessly across Android devices, iOS devices, and [3] Interactive Learning Modules: Users can access
web browsers. structured, video-based tutorials that teach ISL from
beginner to advanced levels. These modules use engaging
[2] Flask: Flask is a lightweight and versatile web animations and real-life gesture examples to make
framework written in Python, widely used for learning effective and enjoyable.[2]
developing RESTful APIs and backend applications.
In this project, Flask acts as the core backend server [4] Courses to Learn Indian Sign Language:To promote
responsible for handling incoming HTTP requests awareness and learning, the app provides curated links and
from the Flutter frontend. When a user uploads an resources for online and offline courses on Indian Sign
image or captures a live frame from the camera, the Language. This helps individuals who want to learn ISL for
Flask backend receives this input and processes it communication, professional development, or educational
through the gesture recognition pipeline. purposes. Encouraging ISL literacy among non-signers is a
step towards a more inclusive society.[1]
[3] Ngrok: Ngrok is a powerful tunneling service that
allows developers to expose their local servers to the Cross-Platform Compatibility: The entire system is
internet via secure public URLs. It is used in the Sign developed using Flutter, ensuring that it works seamlessly
2 Speech system to make the locally hosted Flask on Android, iOS, and web platforms. Users can access the
backend accessible to the Flutter frontend running on app from a smartphone, tablet, or browser, maintaining
mobile or web platforms, regardless of network consistent performance and user experience across
constraints. Ngrok creates an encrypted tunnel from a devices.[5]
publicly accessible domain to the localhost.
V. METHODOLOGY
[4] MediaPipe: MediaPipe is a cross-platform, open-
source framework developed by Google for building [1] Data Collection:
multimodal (e.g., video, audio) machine learning We created a dataset consisting of images and video clips
pipelines. One of its most significant modules—hand of Indian Sign Language (ISL) gestures. Each gesture
represents a commonly used word or phrase. Data
was captured under controlled lighting conditions
from different users to increase the model's accuracy
and adaptability across variations.

[2] Hand Landmark Detection (Feature Extraction):


For feature extraction, we used MediaPipe's hand
tracking module, which detects 21 landmarks per
hand. Each landmark provides (x, y) coordinates
normalized to the frame. These points capture the
hand's posture and position, forming the basis for
recognizing specific gestures.

[3] Data Preprocessing: Fig.I: Sign Detection


The extracted landmarks are normalized to a
consistent scale to reduce variation due to hand size
or camera distance. Frames are uniformly sampled
from videos, and each gesture is assigned a label.
Data augmentation such as flipping is applied to
increase robustness. Invalid or noisy samples are
removed to maintain quality.

[4] Model Training:


We used a Random Forest Classifier from Scikit-
learn for training. Each sample contains 42 features—
(x, y) pairs from 21 landmarks. The dataset is split
into training and testing sets (80:20). Grid search and
cross-validation are used to fine-tune
hyperparameters. The final model is saved for
deployment. Fig.II: Job Opportunities Page

[5] Frontend Development with Flutter:


The frontend is built using Flutter for cross-platform
compatibility. It allows users to capture live gestures
or upload images. Detected gestures are shown as text
and spoken aloud. The UI is designed to be simple
and accessible, with additional features like job
listings and ISL learning modules.

[6] Backend API Development:


The Flask backend receives gesture input from the
frontend, processes the frame using OpenCV and
MediaPipe, and performs inference using the trained
model. The response is then sent back to the frontend
as a text label, which is further converted into speech Fig.III: Online Courses Page
using a text-to-speech engine

[7] Backend Integration with Flutter: VII. CONCLUSION


The backend is exposed to the internet using Ngrok,
enabling seamless integration with the Flutter The Sign2Speech project presents an innovative and practical
frontend across different platforms without deploying approach to bridging the communication gap between hearing
to a public server. HTTP requests from the app are and hearing-impaired individuals through the use of real-time
routed to Flask via a secure tunnel, making the system Indian Sign Language (ISL) recognition and natural language
accessible in real-time from any connected device. processing. By integrating advanced technologies such as
MediaPipe for hand landmark detection, OpenCV for image
processing, and scikit-learn for gesture classification, the
system demonstrates high accuracy and responsiveness in
VI. RESULT
detecting hand signs. The use of Flutter for the frontend
As a result, the Android application is created. The ensures a smooth, cross-platform user experience across
glimpse of the generated result are as follows: mobile and web applications, while Flask and Ngrok enable
flexible and accessible backend operations.
Moreover, the inclusion of features such as text-to-speech
conversion, job opportunities, and learning resources makes
the application more than just a recognition tool—it becomes
a comprehensive platform for empowerment, education, and
inclusion. The system supports both live camera detection and
image uploads, catering to diverse user scenarios and
hardware capabilities.In conclusion, Sign 2 Speech is a
significant step toward inclusive technology. It not only
showcases the potential of machine learning in real-
world applications but also contributes positively to
social good by enhancing accessibility and creating
opportunities for the deaf community. With future
improvements such as multi-hand detection, sentence
formation, and integration with regional languages.

VIII. REFERRENCE
[1] “M. A. Hearst, "Design recommendations for
accessible interactive systems," IEEE Transactions on
Human-Machine Systems, vol. 50, no. 2, pp. 145–157,
Apr. 2020. doi: 10.1109/THMS.2020.2964214.”

[2] “P. Molchanov, S. Gupta, K. Kim, and K. Pulli, "Hand


gesture recognition with 3D convolutional neural
networks," in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops
(CVPRW), 2015, pp. 1–7.”

[3] “F. Zhang, C. Wang, J. Zhu, and H. Lu, "Real-time


hand gesture recognition using finger segmentation," in
2018 24th International Conference on Pattern
Recognition (ICPR), Beijing, China, 2018, pp. 3105–
3110. doi: 10.1109/ICPR.2018.8545524.”

[4] “S. Biswas, R. D. Das, and S. Basu, "A real-time hand


gesture recognition system for human-computer
interaction," in 2019 IEEE Calcutta Conference
(CALCON), Kolkata, India, 2019, pp. 298–303. doi:
10.1109/CALCON.2019.8776683.”

[5] “Google, "MediaPipe: Cross-platform, customizable


ML solutions for live and streaming media," [Online].
Available: https://mediapipe.dev.”

[6] “P. Viola and M. Jones, "Rapid object detection using


a boosted cascade of simple features," in Proceedings of
the 2001 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, Kauai, HI,
USA, 2001, pp. I-511–I-518..”

You might also like