Iihbjk
Iihbjk
Abstract—Sign 2 Speech is an innovative system utilizes advanced machine learning techniques in conjunction
designed to empower the deaf and hard-of-hearing with computer vision tools to detect and interpret hand
community by enabling seamless communication with gestures from video input. By leveraging MediaPipe's hand
hearing individuals. The core functionality of the landmark detection, OpenCV for image and video processing,
system is its ability to detect Indian Sign Language and scikit-learn for model training and evaluation, the system
(ISL) gestures using computer vision techniques and is capable of recognizing complex sign language gestures
convert them into spoken words through natural
with high accuracy.
language processing (NLP). The system is built with a
robust backend using Flask and Ngrok, which allows In addition to sign language translation, the system also
for real-time sign language detection via webcam provides valuable resources for the deaf community, such as
input, and is accessible from any device with internet job listings and courses for learning sign language. These
connectivity. resources are aimed at improving the quality of life for deaf
The sign language recognition model is trained individuals by promoting employment opportunities and
using MediaPipe's hand landmark detection, a enhancing access to educational tools for sign language
cutting-edge technology that identifies key hand learning.
gestures (21 points) in video frames, combined with
This system not only enables real-time sign language
OpenCV for image/video input/output and webcam
capture. The processed data is fed into a machine detection but also strives to improve inclusivity and
learning model developed using scikit-learn, ensuring accessibility for the deaf community by addressing gaps in
accurate gesture recognition. communication, education, and employment.
[1] Flutter: Flutter is an open-source UI software [2] Intuitive and Accessible User Interface: The application is
development toolkit created by Google. It allows designed with a user-friendly Flutter-based interface that
developers to build natively compiled applications for ensures ease of navigation for deaf and mute users. Clear
mobile, web, and desktop using a single codebase. In layouts and accessible controls provide a seamless
the Sign 2 Speech project, Flutter is used to build a experience, even for users new to digital platforms.[1]
user-friendly and interactive frontend that works
seamlessly across Android devices, iOS devices, and [3] Interactive Learning Modules: Users can access
web browsers. structured, video-based tutorials that teach ISL from
beginner to advanced levels. These modules use engaging
[2] Flask: Flask is a lightweight and versatile web animations and real-life gesture examples to make
framework written in Python, widely used for learning effective and enjoyable.[2]
developing RESTful APIs and backend applications.
In this project, Flask acts as the core backend server [4] Courses to Learn Indian Sign Language:To promote
responsible for handling incoming HTTP requests awareness and learning, the app provides curated links and
from the Flutter frontend. When a user uploads an resources for online and offline courses on Indian Sign
image or captures a live frame from the camera, the Language. This helps individuals who want to learn ISL for
Flask backend receives this input and processes it communication, professional development, or educational
through the gesture recognition pipeline. purposes. Encouraging ISL literacy among non-signers is a
step towards a more inclusive society.[1]
[3] Ngrok: Ngrok is a powerful tunneling service that
allows developers to expose their local servers to the Cross-Platform Compatibility: The entire system is
internet via secure public URLs. It is used in the Sign developed using Flutter, ensuring that it works seamlessly
2 Speech system to make the locally hosted Flask on Android, iOS, and web platforms. Users can access the
backend accessible to the Flutter frontend running on app from a smartphone, tablet, or browser, maintaining
mobile or web platforms, regardless of network consistent performance and user experience across
constraints. Ngrok creates an encrypted tunnel from a devices.[5]
publicly accessible domain to the localhost.
V. METHODOLOGY
[4] MediaPipe: MediaPipe is a cross-platform, open-
source framework developed by Google for building [1] Data Collection:
multimodal (e.g., video, audio) machine learning We created a dataset consisting of images and video clips
pipelines. One of its most significant modules—hand of Indian Sign Language (ISL) gestures. Each gesture
represents a commonly used word or phrase. Data
was captured under controlled lighting conditions
from different users to increase the model's accuracy
and adaptability across variations.
VIII. REFERRENCE
[1] “M. A. Hearst, "Design recommendations for
accessible interactive systems," IEEE Transactions on
Human-Machine Systems, vol. 50, no. 2, pp. 145–157,
Apr. 2020. doi: 10.1109/THMS.2020.2964214.”